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ABSTRACT 


We present a new role system for specifying changing refer- 
encing relationships of heap objects. The role of an object 
depends, in large part, on its aliasing relationships with other 
objects, with the role of each object changing as its aliasing 
relationships change. Roles therefore capture important ob- 
ject and data structure properties and provide useful infor- 
mation about how the actions of the program interact with 
these properties. Our role system enables the programmer 
to specify the legal aliasing relationships that define the set 
of roles that objects may play, the roles of procedure param- 
eters and object fields, and the role changes that procedures 
perform while manipulating objects. We present an inter- 
procedural, compositional, and context-sensitive role analy- 
sis algorithm that verifies that a program respects the role 
constraints. 
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1 Introduction 


Types capture important properties of the objects that pro- 
grams manipulate, increasing both the safety and readability 
of the program. Traditional type systems capture properties 
(such as the format of data items stored in the fields of the 
object) that are invariant over the lifetime of the object. But 
in many cases, properties that do change are as important 


as properties that do not. Recognizing the benefit of cap- 
turing these changes, researchers have developed systems in 
which the type of the object changes as the values stored in 
its fields change or as the program invokes operations on the 
object [44, 43, 10, 47, 48, 4, 20, 13]. These systems integrate 
the concept of changing object states into the type system. 

The fundamental idea in this paper is that the state of 
each object also depends on the data structures in which it 
participates. Our type system therefore captures the refer- 
encing relationships that determine this data structure par- 
ticipation. As objects move between data structures, their 
types change to reflect their changing relationships with 
other objects. Our system uses roles to formalize the con- 
cept of a type that depends on the referencing relationships. 
Each role declaration provides complete aliasing information 
for each object that plays that role—in addition to specify- 
ing roles for the fields of the object, the role declaration also 
identifies the complete set of references in the heap that refer 
to the object. In this way roles generalize linear type sys- 
tems [45, 2, 30] by allowing multiple aliases to be statically 
tracked, and extend alias types [42, 46] with the ability to 
specify roles of objects that are the source of aliases. 

This approach attacks a key difficulty associated with 
state-based type systems: the need to ensure that any state 
change performed using one alias is correctly reflected in the 
declared types of the other aliases. Because each object’s 
role identifies all of its heap aliases, the analysis can verify 
the correctness of the role information at all remaining or 
new heap aliases after an operation changes the referencing 
relationships. 

Roles capture important object and data structure prop- 
erties, improving both the safety and transparency of the 
program. For example, roles allow the programmer to ex- 
press data structure consistency properties (with the proper- 
ties verified by the role analysis), to improve the precision of 
procedure interface specifications (by allowing the program- 
mer to specify the role of each parameter), to express precise 
referencing and interaction behaviors between objects (by 
specifying verified roles for object fields and aliases), and to 
express constraints on the coordinated movements of objects 
between data structures (by using the aliasing information in 
role definitions to identify legal data structure membership 
combinations). Roles may also aid program optimization by 
providing precise aliasing information. 

This paper makes the following contributions: 


e Role Concept: The concept that the state of an ob- 
ject depends on its referencing relationships; specifi- 
cally, that objects with different heap aliases should be 
regarded as having different states. 


e Role Definition Language: It presents a language 
for defining roles. The programmer can use this lan- 
guage to express data structure invariants and proper- 
ties such as data structure participation. 


e Programming Model: It presents a set of role con- 
sistency rules. These rules give a programming model 
for changing the role of an object and the circumstances 
under which roles can be temporarily violated. 


e Procedure Interface Specification Language: It 
presents a language for specifying the initial context 
and effects of each procedure. The effects summarize 
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Figure 1: Role Reference Diagram for Scheduler 


the actions of the procedure in terms of the references 
it changes and the regions of the heap that it affects. 


e Role Analysis Algorithm: It presents an algorithm 
for verifying that the program respects the constraints 
given by a set of role definitions and procedure spec- 
ifications. The algorithm uses a data-flow analysis to 
infer intermediate referencing relationships between ob- 
jects, allowing the programmer to focus on role changes 
and procedure interfaces. 


2 Example 


Figure 1 presents a role reference diagram for a process 
scheduler. Each box in the diagram denotes a disjoint set of 
objects of a given role. The labelled arrows between boxes 
indicate possible references between the objects in each set. 
As the diagram indicates, the scheduler maintains a list of 
live processes. A live process can be either running or sleep- 
ing. The running processes form a doubly-linked list, while 
sleeping processes form a binary tree. Both kinds of pro- 
cesses have proc references from the live list nodes LiveList. 
Header objects RunningHeader and SleepingTree simplify 
operations on the data structures that store the process ob- 
jects. 

As Figure 1 shows, data structure participation deter- 
mines the conceptual state of each object. In our exam- 
ple, processes that participate in the sleeping process tree 
data structure are classified as sleeping processes, while pro- 
cesses that participate in the running process list data struc- 
ture are classified as running processes. Moreover, move- 
ments between data structures correspond to conceptual 
state changes—when a process stops sleeping and starts run- 
ning, it moves from the sleeping process tree to the running 
process list. 


2.1 Role Definitions 


Figure 2 presents the role definitions for the objects in our 
example.’ Each role definition specifies the constraints that 
an object must satisfy to play the role. Field constraints 


1ln general, each role definition would specify the static 
class of objects that can play that role. To simplify the 


specify the roles of the objects to which the fields refer, while 
slot constraints identify the number and kind of aliases of the 
object. 


role LiveHeader { 
fields next : LiveList | null; 
} 
role LiveList { 
fields next : LiveList | null, 
proc : RunningProc | SleepingProc; 
slots LiveList.next | LiveHeader.next; 
acyclic next; 
} 
role RunningHeader { 
fields next : RunningProc | RunningHeader, 
prev : RunningProc | RunningHeader ; 
slots RunningHeader.next | RunningProc.next, 
RunningHeader.prev | RunningProc.prev; 
identities next.prev, prev.next; 
} 
role RunningProc { 
fields next : RunningProc | RunningHeader, 
prev : RunningProc | RunningHeader ; 
slots RunningHeader.next | RunningProc.next, 
RunningHeader.prev | RunningProc.prev, 
LiveList.proc; 
identities next.prev, prev.next; 
} 
role SleepingTree { 
fields root : SleepingProc | null, 
acyclic left, right; 
} 
role SleepingProc { 
fields left : SleepingProc | null, 
right : SleepingProc | null; 
slots SleepingProc.left | SleepingProc.right | 
SleepingTree.root; 
LiveList.proc; 
acyclic left, right; 
} 
role DeadProc { } 


Figure 2: Role Definitions for a Scheduler 


Role definitions may also contain two additional kinds of 
constraints: identity constraints, which specify paths that 
lead back to the object, and acyclicity constraints, which 
specify paths with no cycles. In our example, the identity 
constraint next.prev in the RunningProc role specifies the 
cyclic doubly-linked list constraint that following the next, 
then prev fields always leads back to the initial object. The 
acyclic constraint left, right in the SleepingProc role 
specifies that there are no cycles in the heap involving only 
left and right edges. On the other hand, the list of run- 
ning processes must be cyclic because its nodes can never 
point to null. 

The slot constraints specify the complete set of heap 
aliases for the object. In our example, this implies that no 
process can be simultaneously running and sleeping. 


presentation, we assume that all objects are instances of a 
single class with a set of fields F. 


In general, roles can capture data structure consistency 
properties such as disjointness and can prevent representa- 
tion exposure [8]. As a data structure description language, 
roles can naturally specify trees with additional pointers. 
Roles can also approximate non-tree data structures like 
sparse matrices. Because most role constraints are local, 
it is possible to inductively infer them from data structure 
instances. 


2.2 Roles and Procedure Interfaces 


Procedures specify the initial and final roles of their parame- 
ters. The suspend procedure in Figure 3, for example, takes 
two parameters: an object with role RunningProc p, and 
the SleepingTree s. The procedure changes the role of the 
object referenced by p to SleepingProc whereas the object 
referenced by s retains its original role. To perform the role 
change, the procedure removes p from its RunningList data 
structure and inserts it into the SleepingTree data struc- 
ture s. If the procedure fails to perform the insertions or 
deletions correctly, for instance by leaving an object in both 
structures, the role analysis will report an error. 

procedure suspend(p : RunningProc ->> SleepingProc, 

s : SleepingTree) 
local pp, pn, r; 


pp = p.prev; 
r= s.root; 
p-prev = null; p.next = null; 
pp.next = pn; pn.prev = pp; 
S.root = p; p-left = 1; 
setRole(p : SleepingProc) ; 


pn = p.next; 


Figure 3: Suspend Procedure 


3 Abstract Syntax and Semantics of Roles 


In this section, we precisely define what it means for a given 
heap to satisfy a set of role definitions. In subsequent sec- 
tions we will use this definition as a starting point for a 
programming model and role analysis. 


3.1 Heap Representation 


We represent a concrete program heap as a finite directed 
graph H. with nodes(H.) representing objects of the heap 
and labelled edges representing heap references. A graph 
edge (01, f,02) € H_ denotes a reference with field name f 
from object 0; to object 02. To simplify the presentation, we 
fix a global set of fields F and assume that all objects have 
all fields in F'. We do not consider subtyping or dynamic 
dispatch in this paper. 


3.2 Role Representation 


Let R denote the set of roles used in role definitions, nullz be 
a special symbol always denoting a null object null., and let 


Ro = RU{nullz}. We represent each role as the conjunction 
of the following four kinds of constraints: 


e Fields: For every field name f € F we introduce a 
function fields : R — 2%0 denoting the set of roles 
that objects of role r € R can reference through field 
f. <A field f of role r can be null if and only if 
nullz € fieldy(r). The explicit use of nullz and the pos- 
sibility to specify a set of alternative roles for every field 
allows roles to express both may and must referencing 
relationships. 


e Slots: Every role r has slotno(r) slots. A slot slot,(r) of 
role r € Ris asubset of Rx F. Let o be an object of role 
r and o’ an object of role r’. A reference {o', f,o) € He 
can fill a slot k of object o if and only if (r’, f) € slot, (r). 
An object with role r must have each of its slots filled 
by exactly one reference. 


e Identities: Every role r € R has a set of identities(r) C 
F x F. Identities are pairs of fields (f,g) such that 
following reference f on object o and then returning on 
reference g leads back to o. 


e Acyclicities: Every role r € R has a set acyclic(r) C F 
of fields along which cycles are forbidden. 


3.3. Role Semantics 


We define the semantics of roles as a conjunction of invari- 
ants associated with role definitions. A concrete role assign- 
ment is a map p- : nodes(H.) — Ro such that p,(null.) = 
nullg. 


Definition 1 Given a set of role definitions, we say that 
heap H, is role consistent iff there exists a role assignment 
Pe : nodes(H.) + Ro such that for every o € nodes(H.) the 
predicate locallyConsistent(o, He, p.) is satisfied. We call any 
such role assignment p- a valid role assignment. 


The predicate locallyConsistent(o, H., p-) formalizes the con- 
straints associated with role definitions. 


Definition 2 locallyConsistent(o, H.,p.) iff all of the fol- 
lowing conditions are met. Let r = p-(o). 


1) For every field f € F and {o,f,o') € He, p-(o') € 
field ¢(r). 

2) Let {(01, fa),--+5 (On; fr)} _ {(o', f) | (o', f, 0) € 
H.} be the set of all aliases of node 0. Then k = 


slotno(r) and there exists some permutation p of the 
set {1,...,k} such that (pc(0i), fi) € slotp,(r) for all 4. 


3) if (0, f,o') € He, (o', 9,0") € i, and 
(f,g) € identities(r), then o = 0”. 


4) It ts not the case that graph H. contains a cycle 
01, f1,---,0s, fs,01 where 01 =o and 
fi,---; fs € acyclic(r) 


Note that a role consistent heap may have multiple valid 
role assignments p-. However, in each of these role assign- 
ments, every object o is assigned exactly one role p.(o). 
The existence of a role assignment p. with the property 
pe(o1) # pc(o2) thus implies 0; # 02. This is just one of 
the ways in which roles make aliasing more predictable. 


4 Role Properties 


Roles capture important properties of the objects and pro- 
vide useful information about how the actions of the program 
affect those properties. 


e Consistency Properties: Roles can ensure that the 
program respects application-level data structure con- 
sistency properties. The roles in our process scheduler, 
for example, ensure that a process cannot be simulta- 
neously sleeping and running. 


e Interface Changes: In many cases, the interface of an 
object changes as its referencing relationships change. 
In our process scheduler, for example, only running pro- 
cesses can be suspended. Because procedures declare 
the roles of their parameters, the role system can en- 
sure that the program uses objects correctly even as the 
object’s interface changes. 


e Multiple Uses: Code factoring minimizes code dupli- 
cation by producing general-purpose classes (such as 
the Java Vector and Hashtable classes) that can be 
used in a variety of contexts. But this practice ob- 
scures the different purposes that different instances of 
these classes serve in the computation. Because each in- 
stance’s purpose is usually reflected in its relationships 
with other objects, roles can often recapture these dis- 
tinctions. 


e Correlated Relationships: In many cases, groups 
of objects cooperate to implement a piece of function- 
ality. Standard type declarations provide some infor- 
mation about these collaborations by identifying the 
points-to relationships between related objects at the 
granularity of classes. But roles can capture a much 
more precise notion of cooperation, because they track 
correlated state changes of related objects. 


Programmers can use roles for specifying the membership 
of objects in data structures and the structural invariants 
of data structures. In both cases, the slot constraints are 
essential. 

When used to describe membership of an object in a data 
structure, slots specify the source of the alias from a data 
structure node that stores the object. By assigning different 
sets of roles to data structures used at different program 
points, it is possible to distinguish nodes stored in different 
data structure instances. As an object moves between data 
structures, the role of the object changes appropriately to 
reflect the new source of the alias. 

When describing nodes of data structures, slot constraints 
specify the aliasing constraints of nodes; this is enough to 
precisely describe a variety of data structures and approxi- 
mate many others. Property 16 below shows how to identify 
trees in role definitions even if tree nodes have additional 
aliases from other sets of nodes. It is also possible to define 
nodes which make up a compound data structure linked via 
disjoint sets of fields, such as threaded trees, sparse matrices 
and skip lists. 


Example 3 The following role definitions specify a sparse 
matrix of width and height at least 3. These definitions can 
be easily constructed from a sketch of a sparse matrix, as in 
Figure 4. 


Figure 4: Roles of Nodes of a Sparse Matrix 


role Ai { 
fields right : A2, down : A4; 
acyclic right, down; 
} 
role A2 { 
fields right : A2 | A3, down : A5; 
slots Al.right | A2.right; 
acyclic right, down; 
} 
role A3 { 
fields down : A6; 
slots A2.right; 
acyclic right, down; 
} 
role A4 { 
fields right : A5, down : A4 | A7; 
slots Ai.down | A4.down; 
acyclic right, down; 
} 
role A5 { 
fields right : A5 | A6, down : A5 | A8; 
slots A4.right | A5.right, A2.down | A5.down; 
acyclic right, down; 
} 
role A6 { 
fields down : A6 | AQ; 
slots A5.right, A3.down | A6.down; 
acyclic right, down; 
} 
role A7 { 
fields right : A8; 
slots A4.down; 
acyclic right, down; 
} 
role A8 { 
fields right : A8 | AQ; 
slots A7.right | A8.right, A5.down; 
acyclic right, down; 
} 
role AQ { 
slots A8.right, A6.down; 
acyclic right, down; 
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Figure 5: Sketch of a Two-Level Skip List 
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Example 4 We next give role definitions for a two-level 
skip list [36] sketched in Figure 5. 


role SkipList { 
fields one : OneNode | TwoNode | null; 
two : TwoNode | null; 
} 
role OneNode { 
fields one : OneNode | TwoNode | null; 
two : null; 
slots OneNode.one | TwoNode.one | SkipList.one; 
acyclic one, two; 
} 
role TwoNode { 
fields one : OneNode | TwoNode | null; 
two : TwoNode | null; 
slots OneNode.one | TwoNode.one | SkipList.one, 
TwoNode.two | SkipList.two; 
acyclic one, two; 


} 


4.1 Formal Properties of Roles 


In this section we identify some of the invariants expressible 
using sets of mutually recursive role definitions. A further 
study of role properties can be found in [31]. 

The following properties show some of the ways role spec- 
ifications make object aliasing more predictable. They are 
an immediate consequence of the semantics of roles. 


Property 5 (Role Disjointness) 
If there exists a valid role assignment pe for H- such that 
(01) F plo2), then o1 F or. 


The previous property gives a simple criterion for showing 
that objects 0; and o2 are unaliased: find a valid role as- 
signment which assigns different roles to 0; and 02. This 
use of roles generalizes the use of static types for pointer 
analysis [12]. Since roles create a finer partition of objects 
than a typical static type system, their potential for proving 
absence of aliasing is even larger. 


Property 6 (Disjointness Propagation) 

If (01, f,02), (03, 9,04) € He, 01 # 03, and there exists a valid 
role assignment pe for He. such that pc(o2) = pe(oa) = 1 but 
fields(r) M fieldg(r) = @, then o2 # 04. 


Property 7 (Generalized Uniqueness) 

If (01, f,02), (03,9,04) € He, 01 # 03, and there exists a 
role assignment pc such that pc(o2) = pe(oa) = 1, but there 
are no indices i # j such that (p-(01), f) € sloti(r) and 
{pc(o2),g) € slot;(r) then 02 # 04. 


A special case of Property 7 occurs when slotno(r) = 1; this 
constrains all references to objects of role r to be unique. 

Role definitions induce a role reference diagram RRD 
which captures some, but not all, role constraints. 


Definition 8 (Role Reference Diagram) 

Gwen a set of definitions of roles R, a role reference diagram 
RRD is ts a directed graph with nodes Ro and labelled edges 
defined by 


RRD = {(r, fr’) |r’ € field (r) and Fi (r, f) € slot;(r’)} 
U {(r, f, nullz) | null € field ¢(r)} 


Each role reference diagram is a refinement of the corre- 
sponding class diagram in a statically typed language, be- 
cause it partitions classes into multiple roles according to 
their referencing relationships. The sets pz'(r) of objects 
with role r change during program execution, reflecting the 
changing referencing relationships of objects. 

Role definitions give more information than a role refer- 
ence diagram. Slot constraints specify not only that objects 
of role r; can reference objects of role r2 along field f, but 
also give cardinalities on the number of references from other 
objects. In addition, role definitions include identity and 
acyclicity constraints, which are not present in role refer- 
ence diagrams. 


Property 9 Let p. be any valid role assignment. Define 


G= {{pe(o1); f, pe (02)) | (01, f,02) € He} 
Then G is a subgraph of RRD. 


It follows from Property 9 that roles give an approximation 
of may-reachability among heap objects. 


Property 10 (May Reachability) 

If there is a valid role assignment p- : nodes(H.) + Ro such 
that pc(o1) # pe(o2) where 01,02 € nodes(H.) and there is 
no path from p-(01) to pe(o2) in the role reference diagram 
RRD, then there ts no path from 01 to 02 in He. 


The next property shows the advantage of explicitly speci- 
fying null references in role definitions. While the ability to 
specify acyclicity is provided by the acyclic constraint, it 
is also possible to indirectly specify must-cyclicity. 


Property 11 (Must Cyclicity) 

Let Fo C F and Rac C R be a set of nodes in the role ref- 
erence diagram RRD such that for every node r € Rec, if 
(r, f,r’) € RRD then r’ € Reve. If pe is a valid role assign- 
ment for H., then every object 0, € He with pe(o1) € Reve 
ts a member of a cycle in H. with edges from Fo. 


The following property shows that roles can specify a form 
of must-reachability among the sets of objects with the same 
role. 


Property 12 (Downstream Path Termination) 

Assume that for some set of fields Fo C F there are sets of 
nodes Rinter C R, Rewa. C Ro of the role reference diagram 
RRD such that for every node r € Rinter: 


1. Fo C acyclic(r) 


2. af {r, fe r’) E RRD for f E Fo, then r E Rinter U Reinat 


Let pe be a valid role assignment for H.. Then every path in 
H, starting from an object 0; with role p-(o1) € Rinter and 
containing only edges labelled with Fo ts a prefiz of a path 
that terminates at some object 02 with pc(o2) € Reina. 


Property 13 (Upstream Path Termination) 

Assume that for some set of fields Fo C F there are sets of 
nodes Rinter C R, Rt C Ro of the role reference diagram 
RRD such that for every node r € Rinter: 


1. Fo C acyclic(r) 
2. af {r’, fe r) E RRD for f E Fo, then y' E Rinter U Rint 


Let pe be a valid role assignment for H.. Then every path 
in H, terminating at an object 02 with pe(o2) € Rinter and 
containing only edges labelled with Fo is a suffix of a path 
which started at some object 01, where pe(o1) € Rint. 


The next two properties guarantee reachability properties 
by which there must exist at least one path in the heap, 
rather than stating properties of all paths as in Properties 
12 and 13. 


Property 14 (Downstream Must Reachability) 

Assume that for some set of fields Fo C F there are sets of 
roles Riwter C R, Rewa C Ro of the role reference diagram 
RRD such that for every node r € Rinter: 


1. Fo C acyclic(r) 
2. there exists f € Fo such that fields (r) C Rinter U Rear 


Let pe be a valid role assignment for H.. Then for every 
object 0, with p-(o1) € Rinter there is a path in H, with edges 
from Fo from 0; to some object og where pc(o2) € Reiwa- 


Property 15 (Upstream Must Reachability) 

Assume that for some set of fields Fo C F there are sets 
of nodes Riwter C R, Rint C R of the role reference diagram 
RRD such that for every node r € Rinter: 


1. Fo C acyclic(r) 
2. there exists k such that slot,(r) C (Rinter U Rit) x F 


Let pe be a valid role assignment for H,.. Then for every 
object 02 with pc(o2) € Rinter there is a path in H. from 
some object 01 with pe(oi) € Rut to the object o2. 


Trees are a class of data structures especially suited for 
static analysis. Roles can express graphs that are not trees, 
but it is useful to identify trees as certain sets of mutually 
recursive role definitions. 


Property 16 (Treeness) 
Let Rrree C R be a set of roles and Fo C F set of fields such 
that for every r € Rrree 


1. Fo C acyclic(r) 
2. |{i | slots(r) M (Rrree X Fo) £ O}| <1 


Let pe be a valid role assignment for H, and S C 
{(n1, f, n2) | (ni, f,ne2) € Hz, p(n1), p(n2) € Rrree, f € Fo}. 
Then S is a set of trees. 


5 A Programming Model 


In this section we define what it means for an execution of 
a program to respect the role constraints. This definition 
is complicated by the need to allow the program to tem- 
porarily violate the role constraints during data structure 
manipulations. Our approach is to let the program violate 
the constraints for objects referenced by local variables or 
parameters, but require all other objects to satisfy the con- 
straints. 

We first present a simple imperative language with dy- 
namic object allocation and give its operational semantics. 
We then specify additional statement preconditions that en- 
force the role consistency requirements. 


5.1 A Simple Imperative Language 


Our core language contains, as basic statements, Load 
(x=y.£), Store (x.f=y), Copy (x=y), and New (x=new). All 
variables are references to objects in the global heap and all 
assignments are reference assignments. We use an elemen- 
tary test statement combined with nondeterministic choice 
and iteration to express if and while statement, using the 
usual translation [22, 1]. We represent the control flow of 
programs using control-flow graphs. 

A program is a collection of procedures proc € Proc. Pro- 
cedures change the global heap but do not return values. 
Every procedure proc has a list of parameters param(proc) = 
{param,(proc)}; and a list of local variables local(proc). We 
use var(proc) to denote param(proc) U local(proc). A proce- 
dure definition specifies the initial role preR, (proc) and the 
final role postR,, (proc) for every parameter param,,(proc). We 
use proc; for indices j € NV to denote activation records of 
procedure proc. We further assume that there are no modifi- 
cations of parameter variables so every parameter references 
the same object throughout the lifetime of procedure acti- 
vation. 


Example 17 The following kill procedure removes a pro- 
cess from both the doubly linked list of running processes 
and the list of all active processes. This is indicated by the 
transition from RunningProc to DeadProc. 


procedure kill(p : RunningProc ->> DeadProc, 
1 : LiveHeader) 
local prev, current, cp, nxt, lp, In; 
{ 
// find ’p’ in ’1? 
prev = 1; current = 1.next; 
cp = current.proc; 
while (cp != p) { 
prev = current; 
current = current.next; 
cp = current.proc; 
} 
// remove ’current’ and ’p’ from active list 
nxt = current.next; 
prev.next = nxt; current. 
current.proc = null; 
setRole(current : IsolatedCell); 
// remove ’p’ from running list 
lp = p.prev; ln = p.next; 


p-prev = null; p.next = null; 
lp.next = ln; 1n.prev = lp; 
setRole(p : DeadProc) ; 

} 


5.2. Operational Semantics 


In this section we give the operational semantics for our lan- 
guage. We focus on the first three columns in Figures 6 and 
7; the safety conditions in the fourth column are detailed in 
Section 5.4. 


Figure 6 gives the small-step operational semantics for 
the basic statements. We use AW B to denote the union 
AUB where the sets A and B are disjoint. The program 
state consists of the stack s and the concrete heap H,. The 
stack s is a sequence of pairs pQ@proc; € x(Proc x NV), where 
p € Nerc(proc) is a program point, and proc; € Proc x N 
is an activation record of procedure proc. Program points 
p € Nerc(proc) are nodes of the control-flow graphs. There 
is one control-flow graph for every procedure proc. An edge 
of the control-flow graph (p,p') € Ecrg(proc) indicates that 
control may transfer from point p to point p’. We write 
p: stat to state that program point p contains a statement 
stat. The control flow graph of each procedure contains spe- 
cial program points entry and exit indicating procedure en- 
try and exit, with no statements associated with them. We 
assume that all conditions are of the form x==y or ! (x==y) 
where x and y are either variables or a special constant null 
which always points to the null. object. 


The concrete heap is either an error heap error, or a non- 
error heap. A non-error heap H, C Nx Fx NU((ProcxN) x 
V x N) is a directed graph with labelled edges, where nodes 
represent objects and procedure activation records, whereas 
edges represent heap references and local variables. An edge 
{01, f,02) € N x F x N denotes a reference from object 01 
to object o2 via field f € F. An edge (proc;,x,0) € He 
means that local variable x in activation record proc; points 
to object o. 


A load statement x=y.f makes the variable x point to 
node of, which is referenced by the f field of object oy, 
which is in turn referenced by variable y. A store statement 
x.f=y replaces the reference along field f in object o, by 
a reference to object o, that is referenced by y. The copy 
statement x=y copies a reference to object oy, into variable 
x. The statement x=new creates a new object o, with all 
fields initially referencing null,, and makes x point to on. 
The statement test(c) allows execution to proceed only if 
condition c is satisfied. 


Figure 7 describes the semantics of procedure calls. Pro- 
cedure call pushes new activation record onto stack, inserts 
it into the heap, and initializes the parameters. Procedure 
entry initializes local variables. Procedure exit removes the 
activation record from the heap and the stack. 


5.3. Onstage and Offstage Objects 


At every program point the set of all objects of heap H, can 
be partitioned into: 


Statement Role Consistency 


x,y € local(proc), 
{(pQ@proc,; s,H. W {{proc,;, x, oz) }) —> {proc,, y, Oy); (Oy, £,0Ff) € Ho, accessible(o, proc;, H.), 
{p' Qproc;; s, H¢) (p,p') € Ecre(proc), con(H,, offstage(Hz)) 
Hi = H. & {proc,, x, of} 
x,y € local(proc), 
(pQproc;; 8, He. W {(o2, f,0f)}) — {proc;,X, Ox), (proc;, y, oy) € He, of € onstage(H., proc; ) 
(p' Qproc;; s, H2) (p,p') € Ecrc(proc), con( Hz, offstage(H,)) 
Hi, = H.W {{oz, f,0y)} 
x € local(proc), 
y € var(proc), 
(proc;,y,0y) € He, con(H.,, offstage(H!,)) 
(p,p') € Ecec(proc), 
Hy =H.W {(proc;, x, Oy) t 
x € local(proc), 
On fresh, 
(p,p') € Ecrc(proc), con(H,, offstage(Hz)) 
Hi = H.W {(proc;, x, on)} W nulls, 
nulls = {on} x F x {null} 


(pQproc;; s, 1.) — satisfied.(c, proc,;, H.), 
: test i H., offstage(H, 
Pp: teste) | (@proc,; 8, He) (p,p') € Ecec(proc) pales ersee ie) 


satisfied. (x==y, proc,, H.) iff {o | {(proc,;,x,0) € H.} = {o| (proc,,y, 0) € H-} 


(p@proc;; 8, He W {{proc;, x, 02)}) —> 


{p' Qproc;; 8, H¢) 


(pQproc;; 8, Hu {(proc;, x, Oz) }) Tae 
(p' @proc,; 8, HZ) 


satisfied. (! (x==y), proc;, H.) iff not satisfied.(x==y, proc;, H-) 


accessible(o, proc;,H.) := (Ap € param(proc) : {proc;,p,0) € H-.) 
or not (Aproc; Su € var(proc’) : (proc), v,0) € H.) 


Figure 6: Semantics of Basic Statements 


nulls == {{proc,,v, nulle) | 
uv € local(proc), con(H., offstage(H.)) 
(p,p') _€ Ecec(proc) 
j fresh in pQproci; s, 
(p,p') € Ecec(proc), conW(ra, He, S), 
Ok : (proc; k, OK) € He, ra = {(ox, preR, (proc’)) }x, 


(pQ@proc;; s, He) — 
(p'@proc,; 8, He W nulls) 


(p@proc,; s, He) —> 


is U 
P+ proc (xtk)k (entry @proc’;; p' @proc;; s, Hz) 


Hy = He & {(proci, pk, Ok) ks S = offstage(H.) U {or }e 
Vk pe = param;,,(proc ) 


conW(ra, H., 9), 
(pQ@proc;; s, He) — AF = {(proc;,v, 7) | ra = {(parnd, (proc; ), postR, (proc)) fx; 
(s, H. \ AF) {proc,,v,n) € Ho} S=  offstage(H.) U 

{o | {proc;,v,0) € H-} 


parnd, (proc;) = o where (proc,, param, (proc), 0) € H. 


Figure 7: Semantics of Procedure Call 


1. onstage objects (onstage(H.)) referenced by a local 
variable or parameter of some activation frame; 


onstage(H., proc;):={o | dx € var(proc) 
{proc;, x, o) € HH} 
onstage(H.):= |) onstage(H-, proc;) 
proc, 


2. offstage objects (offstage(H.)) unreferenced by local 
or parameter variables. 


offstage(H.) := nodes(H,) \ onstage(H.) 


Onstage objects need not have correct roles. Offstage objects 
must have correct roles assuming some role assignment for 
onstage objects. 


Definition 18 Given a set of role definitions and a set of 
objects Sp C nodes(S.), we say that heap H, is role con- 
sistent for S., and we write con(H.,S.), iff there exists a 
role assignment p. : nodes(H.) — Ro such that the predi- 
cate locallyConsistent(o, H.,pce,Sc) is satisfied for every ob- 
ject o€ Sc. 


We define locallyConsistent(o, He,pc,Sc) to generalize the 
locallyConsistent(o, He, p-) predicate, weakening the acyclic- 
ity condition. 


Definition 19 locallyConsistent(o, He, pc,Sc) holds iff con- 
ditions 1), 2), and 3) of Definition 2 are satisfied and the 
following condition holds: 


4’) It is not the case that graph H, contains a cycle 
01, f1,---,0s, fs,01 such that 
01 = 9, fi,..-, fs € acyclic(r), and 
additionally 01,...,0s € Se. 


Here S, is the set of onstage objects that are not allowed 
to create a cycle; objects in nodes(H.) \S. are exempt from 
the acyclicity condition. The locallyConsistent(o, He, pc, Sc) 
and con(H., S-) predicates are monotonic in S., so a larger 
S- implies a stronger invariant. For S, = nodes(H-.), consis- 
tency for S_ is equivalent with heap consistency from Defini- 
tion 1. Note that the role assignment p, specifies roles even 
for objects o € nodes(H.) \ Sc. This is because the role of 
o may influence the role consistency of objects in S. which 
are adjacent to o. 

At procedure calls, the role declarations for parameters 
restrict the set of potential role assignments. We therefore 
generalize con(He, S.) to conW(ra, He, S-), which restricts 
the set of role assignments p, considered for heap consis- 
tency. 


Definition 20 Given a set of role definitions, a heap H., a 
set Se C nodes(H.), and a partial role assignment ra C S. > 
R, we say that the heap H, is consistent with ra for S., and 
write conW(ra, H., S.), iff there exists a (total) role assign- 
ment pe : nodes(H.) + Ro such that ra C pe and for every 
object o € S. the predicate locallyConsistent(o, He, pc, Se) is 
satisfied. 


5.4 Role Consistency 


We are now able to precisely state the role consistency re- 
quirements that must be satisfied for program execution. 
The role consistency requirements are in the fourth row of 
Figures 6 and 7. We assume the operational semantics is 
extended with transitions leading to a program state with 
heap error, whenever role consistency is violated. 


5.4.1 Offstage Consistency 


At every program point, we require con(He, offstage(H.)) to 
be satisfied. This means that offstage objects have correct 
roles, but onstage objects may have their role temporarily 
violated. 


5.4.2 Reference Removal Consistency 


The Store statement x.f=y has the following safety precon- 
dition. When a reference (oz, f,of) € He for (proc;,x,02) € 
H., and (0z,f,0f) € He is removed from the heap, both o, 
and os must be referenced from the current procedure ac- 
tivation record. It is sufficient to verify this condition for 
Of, AS Oz is already onstage by definition. The reference re- 
moval consistency condition enables the completion of the 
role change for oy after the reference (oz, f,of) is removed 
and ensures that heap references are introduced and removed 
only between onstage objects. 


5.4.3. Procedure Call Consistency 


Our programming model ensures role consistency across pro- 
cedure calls using the following protocol. 

A procedure call proc’ (1, ...,%p») in Figure 7 requires the 
role consistency precondition conW(ra, H.,S-), where the 
partial role assignment ra requires objects o,, corresponding 
to parameters x, to have roles preR, (proc’) expected by the 
callee, and S. = offstage(H-)U{ox }s for (proc;, rx, 0%) € He. 

To ensure that the callee proc; never observes incorrect 
roles, we impose an accessibility condition for the callee’s 
Load statements (see the fourth column of Figure 6). The 
accessibility condition prohibits access to any object o ref- 
erenced by some local variable of a stack frame other than 
proc, unless o is referenced by some parameter of proc;. 
Provided that this condition is not violated, the callee proc, 
only accesses objects with correct roles, even though objects 
that it does not access may have incorrect roles. In Section 7 
we show how the role analysis ensures that the accessibility 
condition is never violated. 

At the procedure exit point (Figure 7), we require cor- 
rect roles for all objects referenced by the current activation 
frame proc;. This implies that heap operations performed 
by proc’, preserve heap consistency for all objects accessed 


by proc’. 


5.4.4 Explicit Role Check 


The programmer can specify a stronger invariant at any pro- 
gram point using statement roleCheck(x,...,@p,ra). As 
Figure 8 indicates, roleCheck requires the conW(ra, Hz, Sc) 
predicate to be satisfied for the supplied partial role assign- 
ment ra where S. = offstage(H.) U {ox}, for objects o, ref- 
erenced by given local variables xx. 


Statement Role Consistency 


p: roleCheck(%1,...,%n, 1a) (p! @proc,; 8, Hz) 
a9 


(pQproc;; s, H.) — 


conW (ra, H., S), 
= offstage(H.) U 


{p, p’) € Ere S 


{o | (proc;,z%,0) € He} 


Figure 8: Operational Semantics of Explicit Role Check 


5.5 Instrumented Semantics 


We expect the programmer to have a specific role assignment 
in mind when writing the program, with this role assignment 
changing as the statements of the program change the ref- 
erencing relationships. So when the programmer wishes to 
change the role of an object, he or she writes a program that 
brings the object onstage, changes its referencing relation- 
ships so that it plays a new role, then puts it offstage in its 
new role. The roles of other objects do not change.” 

To support these programmer expectations, we introduce 
an augmented programming model in which the role assign- 
ment p. is conceptually part of the program’s state. The 
role assignment changes only if the programmer changes it 
explicitly using the setRole statement. The augmented pro- 
gramming model has an underlying instrumented semantics 
as opposed to the original semantics. 


Example 21 The original semantics allows asserting differ- 
ent roles at different program points even if the structure of 
the heap was not changed, as in the following procedure foo. 


role Ail { fields f : Bi; } 
role Bi { slots A1l.f; } 
role A2 { fields f : B2; } 
role B2 { slots A2.f; } 
procedure foo() 


var x, y; 
{ 
x =new; y = new; 
x.f = y; 


roleCheck(x,y, x:Al,y:B1); 
roleCheck(x,y, x:A2,y:B2); 
} 


Both role checks would succeed since each of the spec- 
ified partial role assignments can be extended to a 
valid role assignment. On the other hand, the check 
roleCheck(x,y, x:A1,y:B2) would fail. 

The procedure foo in the instrumented semantics can be 
written as folllows. 


procedure foo() 


var x, y; 
{ 
x = new; y = new; 
x.f = y; 


setRole(x:A1); setRole(y:B1); 
roleCheck(x,y, x:A1l,y:B1); 


?An extension to the programming model supports cas- 
cading role changes in which a single role change propagates 
through the heap changing the roles of offstage objects, see 
Section 8.2. 


setRole(x:A2); setRole(y:B2); 
roleCheck(x,y, x:A2,y:B2); 
} 


The setRole statement makes the role change of object ex- 
plicit. 


The instrumented semantics extends the concrete heap 
HT, with a role assignment p,. Figure 9 outlines the changes 
in instrumented semantics with respect to the original se- 
mantics. We introduce a new statement setRole(x:r), 
which modifies a role assignment p., giving pe[ox > 1], 
where oz is the object referenced by x. All statements 
other than setRole preserve the current role assignment. 
For every consistency condition conW(ra, He, S-) in the orig- 
inal semantics, the instrumented semantics uses the cor- 
responding condition conW(p- U ra, He, S-) and fails if pe 
is not an extension of ra. Here we consider con(H:, S$) 
to be a shorthand for conW(@,H.,S). For example, the 
new role consistency condition for the Copy statement 
x=y is conW(p¢, He, offstage(H.)). The New statement as- 
signs an identifier unknown to the newly created object on. 
By definition, a node with unknown does not satisfy the 
locallyConsistent predicate. This means that setRole must 
be used to set aa valid role of o, before o, moves offstage. 

By introducing an instrumented semantics we are not sug- 
gesting an implementation that explicitly stores roles of ob- 
jects at run-time. We instead use the instrumented seman- 
tics as the basis of our role analysis and ensure that all role 
checks can be statically removed. Because the instrumented 
semantics is more restrictive than the original semantics, our 
role analysis is a conservative approximation of both the in- 
strumented semantics and the original semantics. 


6 Intraprocedural Role Analysis 


This section presents an intraprocedural role analysis algo- 
rithm. The goal of the role analysis is to statically verify 
the role consistency requirements described in the previous 
section. 

The key observation behind our analysis algorithm is that 
we can incrementally verify role consistency of the concrete 
heap H, by ensuring role consistency for every node when it 
goes offstage. This allows us to represent the statically un- 
bounded offstage portion of the heap using summary nodes 
with “may” references. In contrast, we use a “must” in- 
terpretation for references from and to onstage nodes. The 
exact representation of onstage nodes allows the analysis to 
verify role consistency in the presence of temporary viola- 
tions of role constraints. 

Our analysis representation is a graph in which nodes rep- 
resent objects and edges represent references between ob- 


Statement Role Consistency 


(p@proc;; 8; HH, {{proc;, x, Oz) }, Pe) = 
(p' @proc,; 8, i, Pe) 


{p@proc, ; 8, Ag, Pe) =? 
{p' Qproc;; 8, He; po) 


p: 
setRole(x:r) 


(8, Ag, Pe) —? 
(s', ss Pc) 


x € local(proc), 
On fresh, 
(p,p') € Ecec(proc), 
H! =H. 
W{(proc;, x, On) } 
W{on} x F x {null}, 
Po = Pe[On + unknown] 
x € local(proc,), 
{proc,, x, Ox) € He, 
Po = Pc[Ow +> x], 
(p,p') € Ecee 


conW(pt, Hi, offstage(H:)) 


conW(pt, He, offstage(H-)) 


P A conW(p, Ura, HS) 
for every original condition 
P AconW(ra, Hz’, S) 


Figure 9: Instrumented Semantics 


jects. There are two kinds of nodes: onstage nodes repre- 
sent onstage objects, with each onstage node representing 
one onstage object; and offstage nodes, with each offstage 
node corresponding to a set of objects that play that role. 
To increase the precision of the analysis, the algorithm oc- 
casionally generates multiple offstage nodes that represent 
disjoint sets of objects playing the same role. Distinct off- 
stage objects with the same role r represent disjoint sets of 
objects of role r with different reachability properties from 
onstage nodes. 

We frame role analysis as a data-flow analysis operating 
on a distributive lattice P(RoleGraphs) of sets of role graphs 
with set union U as the join operator. In this section we 
present an algorithm for intraprocedural analysis. We use 
proc, to denote the topmost activation record in a concrete 
heap H,. In Section 7 we generalize the algorithm to the 
compositional interprocedural analysis. 


6.1 Abstraction Relation 


Every data-flow fact G C RoleGraphs is a set of role graphs 
G €gG. Every role graph G € RoleGraphs is either a bot- 
tom role graph 1g representing the set of all concrete heaps 
(including error,), or a tuple G = (H, p, K) representing non- 
error concrete heaps, where 


e HCNxFxN is the abstract heap with nodes N 
representing objects and fields F. The abstract heap 
H represents heap references (ni, f,2) and variables of 
the currently analyzed procedure (proc, 7, n) where x € 
local(proc). Null references are represented as references 
to abstract node null. We define abstract onstage nodes 
onstage(H) = {n | (proc,#,n) € H,x € local(proc) U 
param(proc)} and abstract offstage nodes offstage(H) = 
nodes(#) \ onstage(H) \ {proc, null}. 

e p : nodes(H) —> Ro is an abstract role assignment, 
p(null) = nullz; 

e K : nodes(H) > {i, s} indicates the kind of each node; 
when K(n) = i, then n is an individual node repre- 
senting at most one object, and when K(n) =s,nisa 
summary node representing zero or more objects. We 
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require K (proc) = K(null) = i, and require all onstage 
nodes to be individual, K[onstage(H)] = {i}. 


The abstraction relation a relates a pair (H.,p.) of con- 
crete heap and concrete role assignment with an abstract 
role graph G. 


Definition 22 We say that an abstract role graph G rep- 
resents concrete heap H, with role assignment p. and write 
(He, pc) aG, iff G = Le or: He. # errore, G = (H,p,K), 
and there exists a function h : nodes(H.) — nodes(H) such 
that 


1) H- is role consistent: conW(pc, He, offstage(H.)), 


2) identity relations of onstage nodes with offstage nodes 
hold: if (01, f,02) € He and (02,g,03) € He for o1 € 
onstage(H.), 02 € offstage(H.), and 

{f,g) € identities(p.(01)), then 03 = 01; 

h is a graph homomorphism: if (01, f,02) € He then 
(h(01), f,R(o2)) € H; 

an individual node represents at most one concrete ob- 
ject: K(n) = 1 implies |h~1(n)| <1; 


h is byection on edges which originate or terminate at 
onstage nodes: if (n,f,n2) € H and n; € onstage(H) 
or nz € onstage(H), then there exists exactly one 

{01, f,02) € He such that h(o1) =i and h(o2) = n2; 


A(null,) = null and h(proc,) = proc; 


3) 
4) 
5) 


6) 
7) 


the abstract role assignment p corresponds to the con- 
crete role assignment: pc(o) = p(h(o)) for every object 


o € nodes(H,). 


Note that the error heap error, can be represented only by 
the bottom role graph Lg. The analysis uses lc to indicate 
a potential role error. 

Condition 3) implies that role graph edges are a conserva- 
tive approximation of concrete heap references. These edges 
are in general “may” edges. Hence it is possible for an off 
stage node n that (n, f,n1), (n, f,n2) € H for ni # no. This 
cannot happen when n € onstage(H) because of 5). Another 
consequence of 5) is that an edge in H from an onstage node 


Figure 10: Abstraction Relation 


no to asummary node nz; implies that n,; represents at least 
one object. Condition 2) strengthens 1) by requiring certain 
identity constraints for onstage nodes to hold, as explained 
in Section 6.2.4. 


Example 23 Consider the following role declaration for an 
acyclic list. 


role L { // List header 
fields first : LN | null; 

} 

role LN { // List node 
fields next : LN | null; 
slots LN.next | L.first; 
acyclic next; 


} 


Figure 10 shows a role graph and one of the concrete heaps 
represented by the role graph via homomorphism h. There 
are two local variables, prev and current, referencing dis- 
tinct onstage objects. Onstage objects are isomorphic to 
onstage nodes in the role graph. In contrast, there are two 
objects mapped to each of the summary nodes with role LN 
(shown as LN-labelled rectangles in Figure 10). Note that the 
sets of objects mapped to these two summary nodes are dis- 
joint. The first summary LN-node represents objects stored 
in the list before the object referenced by prev. The second 
summary LN-node represents objects stored in the list after 
the object referenced by current. 


current 
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(He, pc) (Hi, pe) 
G. <x Cs 4, Cx ae cr 


Figure 11: Simulation Relation Between Abstract and Con- 
crete Execution 


6.2 Transfer Functions 


The key complication in developing the transfer functions 
for the role analysis is to accurately model the movement 
of objects onstage and offstage. For example, a load state- 
ment x=y.f may cause the object referred to by y.f to move 
onstage. In addition, if x was the only reference to an on- 
stage object o before the statement executed, object o moves 
offstage after the execution of the load statement, and thus 
must satisfy the locallyConsistent predicate. 

The analysis uses an expansion relation < to model the 
movement of objects onstage and a contraction relation > 
to model the movement of objects offstage. The expansion 
relation uses the invariant that offstage nodes have correct 
roles to generate possible aliasing relationships for the node 
being pulled onstage. The contraction relation establishes 
the role invariants for the node going offstage, allowing the 
node to be merged into the other offstage nodes and repre- 
sented more compactly. 

We present our role analysis as an abstract execution re- 


lation £3. The abstract execution ensures that the abstrac- 
tion relation @ is a forward simulation relation [33] from 
the space of concrete heaps with role assignments to the set 
RoleGraphs. The simulation relation implies that the traces 
of ~ include the traces of the instrumented semantics —>. 
To ensure that the program does not violate constraints as- 
sociated with roles, it is thus sufficient to guarantee that Lc 
is not reachable via ~. 

To prove that Lg is not reachable in the abstract execu- 
tion, the analysis computes for every program point p a set 
of role graphs G that conservatively approximates the pos- 
sible program states at point p. The transfer function for a 


statement st is an image [st](G) = {G’ | G € G,GSG’}. 


The analysis computes the relation &S in three steps: 


1. ensure that the relevant nodes are instantiated using 
expansion relation < (Section 6.2.1); 


2. perform symbolic execution =. of the statement st 
(Section 6.2.3); 


3. merge nodes if needed using contraction relation = to 
keep the role graph bounded (Section 6.2.2). 


Figure 11 shows how the abstraction relation a relates x, 


= and > with the concrete execution —> in instrumented 
semantics. Assume that a concrete heap (Hc, pc) is repre- 
sented by the role graph G;. Then one of the role graphs G2 
obtained after expansion remains an abstraction of (Ho, pc). 


=y.f ny of =y.f Ne 
(H,p,K) > G! | (H,p,K) < Gi =S Go G | (proc, x, nz), (proc, y, ny) € H 


(H, p, K) 3 G' 


=new x=new 


(H, p, K) ==> 


(H,p, K) > G' 


(H,p,K) 2G. @ 
Gi >G 


(H, p, K) => G' 


(proc, x,n1) € H 


(proc, x,n1) € H 
8 € {x.f=y, 
test(c), 
setRole(x:r), 
roleCheck(21.p, ra) } 


Figure 12: Abstract Execution ~ 


The symbolic execution =, followed by the contraction rela- 
tion > corresponds to the instrumented operational seman- 
tics -. 

Figure 12 shows rules for the abstract execution relation 


a, Only Load statement uses the expansion relation, be- 
cause the other statements operate on objects that are al- 
ready onstage. Load, Copy, and New statements may re- 
move a local variable reference from an object, so they use 
contraction relation to move the object offstage if needed. 
For the rest of the statements, the abstract execution re- 
duces to symbolic execution => described in Section 6.2.3. 


Nondeterminism and Failure The %> relation is not a 
function because the expansion relation < can generate a set 
of role graphs from a single role graph. Also, there might be 


no 5 transitions originating from a given state G if the sym- 
bolic execution => produces no results. This corresponds 
to a trace which cannot be extended further due to a test 
statement which fails in state G. This is in contrast to a 
transition from G to Le which indicates a potential role con- 
sistency violation or a null pointer dereference. We assume 
that => and > relations contain the transition (1c, le) 
to propagate the error role graph. In most cases we do not 
write the explicit transitions to error states. 


6.2.1 Expansion 
nif 

Figure 13 shows the expansion relation x. Given a role 
graph (H, p, K) expansion attempts to produce a set of role 
graphs (H’,p’,K') in each of which (n, f,no) € H'’ and 
K(no) = 1%. Expansion is used in abstract execution of the 
Load statement. It first checks for null pointer dereference 
and reports an error if the check fails. If (n, f,n’) € H and 
K(n') = i already hold, the expansion returns the original 
state. Otherwise, (n, f,n') € H with K(n’) = s. In that 
case, the summary node 7’ is first instantiated using instan- 


no nO 
tiation relation 7}. Next, the split relation || is applied. Let 


n 
p(no) = r. The split relation ensures that no is not a member 
of any cycle of offstage nodes which contains only edges in 
acyclic(r). We explain instantiation and split in more detail 
below. 


Instantiation Figure 14 presents the instantiation rela- 


no 
tion. Given a role graph G = (H,p,K), instantiation ‘p 


ni 
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generates the set of role graphs (H’, p', K’) such that each 
concrete heap represented by (H, p,.K) is represented by one 
of the graphs (H’,p’, K'). Each of the new role graphs con- 
tains a fresh individual node no that satisfies localCheck. 
The edges of no are a subset of edges from and to n’. 

Let Ho be a subset of the references between n’ and on- 
stage nodes, and let H; be asubset of the references between 
n’ and offstage nodes. References in Ho are moved from n’ 
to the new node no, because they represent at most one ref- 
erence, while references in H, are copied to no because they 
may represent multiple concrete heap references. Moving a 
reference is formalized via the swing operation in Figure 14. 

The instantiation of a single graph can generate multiple 
role graphs depending on the choice of Hj and Hj. The num- 
ber of graphs generated is limited by the existing references 
of node n’ and by the localCheck requirement for no. This is 
where our role analysis takes advantage of constraints asso- 
ciated with role definitions to reduce the number of aliasing 
possibilities that need to be considered. 


Split The split relation is important for verifying opera- 
tions on data structures such as skip lists and sparse matri- 
ces. It is also useful for improving the precision of the initial 
set of role graphs on procedure entry (Section 7.2.1). 

The goal of the split relation is to exploit the acyclicity 
constraints associated with role definitions. After a node no 
is brought onstage, split represents the acyclicity condition 
of p(no) explicitly by eliminating impossible paths in the 
role graph. It uses additional offstage nodes to encode the 
reachability information implied by the acyclicity conditions. 
This information can then be used even after the role of node 
no changes. In particular, it allows the acyclicity condition 
of no to be verified when no moves offstage. 


Example 24 Consider a role graph for an acyclic list with 
nodes LN and a header node L. The instantiated node no is 
in the middle of the list. Figure 16 a) shows a role graph 
with a single summary node representing all offstage LN- 
nodes. Figure 16 b) shows the role graph after applying 
the split relation. The resulting role graph contains two 
LN summary nodes. The first LN summary node represents 
objects definitely reachable from no along next edges; the 
second summary NL node represents objects definitely not 
reachable from no. 


Figure 15 shows the definition of the split operation on 


nO 
node no, denoted by ||. Let G = (H,p,K) be the initial 


Get Snpm| ne a c oneal 


n,f,n) € H,n’ € offstage(H 
(1. , Pp, K) ie G' (H, p, K) t (Fi, p1,K1) t G' ( f ve f no) € H; Be( 


Figure 13: Expansion Relation 


H =H\HoUH,UM, 

p = plro + p(n’) 

K'= K[no > i] 
localCheck(no, (H", p’, K’)) 


Ho C HN (onstage(H) x F x {n'} U {n'} x F x onstage(H)) 
H, C HM (offstage(H) x F x {n'} U{n'} x F x offstage(H)) 
H6 = swing(n’, no, Ho) 
Hi, C swing(n’, no, Hi) 


swing(Noig, new, 4) = {(nnew, fin) | (Nold> f, n) € H} U 
{(n, f,nnew) | (n; fs old) € HH} U 
{(nnew, f,nnew) | (Noigs f: Mold) © A} 


Figure 14: Instantiation Relation 


no 

(H, P; Kk) | (H, P; k), acycCheck(no, (H, P; k), offstage(#)) 
no 

(H, P; Kk) | CEE p'; Kk’), macycCheck(no, (H, P; k), offstage(#)) 


where 
H' = (Hf \ Heyc) U Aor U Binz U Ber U Bunn U Bun U Neg UM 
eye = {(n1, f, n2) | ny Or n2 E Seyc} 
Hor = { (ni, f, np) | n= c(n}), n2= c(n), 
n1,N2 € offstage, (H),n1 or n2 € Sryc, 
(ni, f,n2) € H } 
\(Sr x acyclic(r) x Snr) 
H 1 (onstage(H) x F'U {no} x acyclic(r)) x Scye = Arnr W Arr 
H1 Scye x (acyclic(r) x {no} UF x onstage(H)) = Atnr J Atr 
Biyr = {(n1, f, hwr(n2)) | (ni, f,n2) € Amr} 
Br = {(n1, f,hr(n2)) | (r1, f,n2) € Am} 
Bur = {(hwr(n1), f, m2) | (ri, f,n2) € Arne} 
Bir = {(hr(ni), f, ma) | (ni, f,n2) € Atr} 
Np = {(no, f, n') | ne Sr, (no, f, e(n’)) € H, f € acyclic(7)} 
N, = {(n' , f; No) ie € Swr; (c (n an f, no) € H, f € acyclic(r)} 
Seyc = {n | dni,...,p—1 € offstage(H) : 
(no, fo, M1), -- +5 (Mk, Fes %)s (Ms fe+1, Me+2); (Mp1, fp-1, no) € A, 
fo,---,fp-1 € Sciclicoy 
offstage, (H) = offstage(H) \ {no} 
r = p(no) 


p'(c(n)) = p(n) 
K' (e(n)) = K(n) 


Figure 15: Split Relation 
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b) After Split 


Figure 16: A Role Graph for an Acyclic List 


role graph and p(no) = r. If acyclic(r) = @, then the split 
operation returns the original graph G; otherwise it proceeds 
as follows. Call a path in graph H cycle-inducing if all of its 
nodes are offstage and all of its edges are in acyclic(r). Let 
Scyc be the set of nodes n such that there is a cycle-inducing 
path from no to n and a cycle-inducing path from n to no. 

The goal of the split operation is to split the set Scyc into 
a fresh set of nodes Snr representing objects definitely not 
reachable from no along edges in acyclic(r) and a fresh set of 
nodes Sr representing objects definitely reachable from no. 
Each of the newly generated graphs H’ has the following 
properties: 


1) merging the corresponding nodes from Snr and SR in 
H' yields the original graph H; 

2) no is not a member of any cycle in H’ consisting of 
offstage nodes and edges in acyclic(r); 


3) onstage nodes in H’ have the same number of fields and 
aliases as in H. 


Let So = nodes(H)\ Scyc and let hnr : Scye + Snr and hry: 
Scyc + Sp be bijections. Define a function c : nodes(H’) > 
nodes(H) as follows: 


n, n€So 
c(n)=< hgi(n), n€ Sr 
hyr(n), n€ Snr 


Then H' C {(nj, f, 2) | (e(n1), f,e(ms)) € Hf. 

Because there are two copies of So in H’, there might be 
multiple edges (n'/,, f,n2) in H’ corresponding to an edge 
{c(n1), f,e(n2)) € H. 
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If both ni and n4 are offstage nodes other than no, we 
always include (ni, f,n2) in H’ unless (nj, f,n2) € Sr x 
acyclic(r) x Snr. The last restriction prevents cycles in H’. 

For an edge (m1, f,n2) € H where n, € onstage(H) and 
nz € Scyc we include in H’ either the edge (ni, f, hnr(n2)) 
or (ni, f,hr(n2)) but not both. Split generates multiple 
graphs H’ to cover both cases. We proceed analogously if 
ng € onstage(H) and n; € Scyc. The node no itself is treated 
in the same way as onstage nodes for f ¢ acyclic(r). If 
f € acyclic(r) then we choose references to no to have a 
source in Snr, whereas the reference from no have the target 
in Sp. 

Details of the split construction are given in Figure 15. 
The intuitive meaning of the sets of edges is the following: 

Hog : edges between offstage nodes 
Bre : edges from onstage nodes to Snr 
Br: edges from onstage nodes to Sz 
Binr : edges from Snr to onstage nodes 


Bir : edges from Sp to onstage nodes 
Ny; :acyclic(r)-edges from no to Sp 
N;, :acyclic(r)-edges from Snr to no 


The sets Bryr and Ber are created as images of the sets 
Ann and Agr which partition edges from onstage nodes to 
nodes in Scyc. Similarly, the sets Binr and Bir are created 
as images of the sets Atnr and Atr which partition edges 
from nodes in S.yc to onstage nodes. 

We note that if in the split operation Scyc = @ then the 
operation has no effect and need not be performed. In Fig- 
ure 16, after performing a single split, there is no need to 
split for subsequent elements of the list. Examples like this 
indicate that split will not be invoked frequently during the 
analysis. 


6.2.2 Contraction 


Figure 17 shows the non-error transitions of the contraction 


relation >. The analysis uses contraction when a reference 
to node n is removed. If there are other references to n, 
the result is the original graph. Otherwise n has just gone 
offstage, so analysis invokes nodeCheck. If the check fails, 
the result is Lg. If the role check succeeds, the contrac- 
tion invokes normalization operation to ensure that the role 
graph remains bounded. For simplicity, we use normaliza- 
tion whenever nodeCheck succeeds, although it is sufficient 
to perform normalization only at program points adjacent 
to back edges of the control-flow graph. 


Normalization Figure 18 shows the normalization rela- 
tion. Normalization accepts a role graph (H,p,K) and pro- 
duces a normalized role graph (H’,p', K') which is a factor 
graph of (H,p,K) under the equivalence relation ~. Two 
offstage nodes are equivalent under ~ if they have the same 
role and the same reachability from onstage nodes. Here we 
consider node n to be reachable from an onstage node no 
iff there is some path from no to n whose edges belong to 
acyclic(p(no)) and whose nodes are all in offstage(H). Note 
that, by construction, normalization avoids merging nodes 
which were previously generated in the split operation ||, 
while still ensuring a bound on the size of the role graph. 
For a procedure with / local variables, f fields and r roles the 
number of nodes in a role graph is on the order of r2' so the 


(H, p, K) =(H, p, K) 


dx € var(proc) : 
{proc, x,n) € H 


(H, p, K) > normalize((H, p,.K)) | nodeCheck(n, (H, p, K), offstage(H)) 


Figure 17: Contraction Relation 


normalize({H, p, K)) = (H’, p', K’) 


where 
p'(nj~) = p(n) 
1 _ 
K'(njn) = { s, otherwise 
ny ~ ne iff ny = n2 or 


H! = {{nijn, f,n2j~) | (ni, f,n2) € H} 


i, ny~ = {n}, K(n) =1 


(m1, 72 € offstage( 7), p(n1) = p(na), 
Vno € onstage(H) : (reach(no, 21) iff reach(no, n2)) 


reach(no, 7) iff dni,...,p-1 € offstage(n), Sfi,.. 


-> fp € acyclic(p(no)) : 


(no, fi, 71); - : ->(Np-1; fp, 2) eH 


Figure 18: Normalization 


maximum size of a chain in the lattice is of the order of 2”? . 
To ensure termination we consider role graphs equal up to 
isomorphism. Isomorphism checking can be done efficiently 
if normalization assigns canonical names to the equivalence 
classes it creates. 


6.2.3 Symbolic Execution 


Figure 19 shows the symbolic execution relation SS In 
most cases, the symbolic execution of a statement acts on 
the abstract heap in the same way that the statement would 
act on the concrete heap. In particular, the Store statement 
always performs strong updates. The simplicity of symbolic 
execution is due to conditions 3) and 5) in the abstraction 
relation a. These conditions are ensured by the ~ relation 
which instantiates nodes, allowing strong updates. The sym- 
bolic execution also verifies the consistency conditions that 
are not verified by < or >. 


Verifying Reference Removal Consistency The ab- 


stract execution 25 for the Store statement can easily verify 
the Store safety condition from section 5.4.2, because the 
set of onstage and offstage nodes is known precisely for ev- 
ery role graph. It returns lg if the safety condition fails. 


Symbolic Execution of setRole The setRole(x:r) 
statement sets the role of node nz referenced by variable 
x tor. Let G = (H,p,K) be the current role graph and 
let (proc, x,nz) € H. If nz has no adjacent offstage nodes, 
the role change always succeeds. In general, there are re- 
strictions on when the change can be done. Let (He, pc) 
be a concrete heap with role assignment represented by G 
and h be a homomorphism from H, to H. Let h(oz) = nz. 
Let ro = pe(or). The symbolic execution must make sure 
that the condition conW(pc, He, offstage(H.)) continues to 
hold after the role change. Because the set of onstage nodes 
does not change, it suffices to ensure that the original roles 
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for offstage nodes are consistent with the new role r. The 
acyclicity constraint involves only offstage nodes, so it re- 
mains satisfied. The other role constraints are local, so they 
can only be violated for offstage neighbors of n,. To make 
sure that no violations occur, we require: 


1. r € fields (p(n)) for all (n, f,nz) € H, and 


2. (x, f) € slots(p(n)) for all (nz, f,n) € H and every slot 
i such that (ro, f) € slots(p(n)) 


This is sufficient to guarantee conW(p¢, He, offstage(H-)). 
To ensure condition 2) in Definition 22 of the abstraction 
relation, we require that for every (f,g) € identities(r), 


1. (f,g) € identities(ro) or 


2. for all (nz, f,n) € H: K(n) = i and ((n,g,n') € H 
implies n’ = nz). 


Symbolic Execution of roleCheck To symbolically ex- 
ecute roleCheck(x1,...,%p,ra), we ensure that the conW 
predicate of the concrete semantics is satisfied for the con- 
crete heaps which correspond to the current abstract role 
graph. The symbolic execution for roleCheck returns the 
error graph lg if p is inconsistent with ra or if any of the 
nodes n; referenced by 2; fail to satisfy nodeCheck. 


6.2.4 Node Check 


The analysis uses the nodeCheck predicate to incrementally 
maintain the abstraction relation. We first define the pred- 
icate localCheck, which roughly corresponds to the predi- 
cate locallyConsistent (Definition 2), but ignores the nonlo- 
cal acyclicity condition and additionally ensures condition 
2) from Definition 22. 


Definition 25 For a role graph G = (H,p,K), an individ- 
ual node n and a set S, the predicate localCheck(n, G) holds 
iff the following conditions are met. Let r = p(n). 


Statements 


(H & {proc, x, nx}, p, K) => (H w& {proc, x, ng}, p, K) 


(Hd {nz, f,ng}; pK) => (HW {no, f, ny}, p, K) ( 


(HW {proc, x, no}, p, K) => (H & {proc, x, mn}, p!, K) 


(proc, Y, My); (ny; f, nt) eH 
proc, x, Nz), (proc, y; Ny) € H 
ny € onstage(H) 


Nn fresh 


(H & {proc, x, nz}, p, K) = (H & {proc, x, ny}, p,.K) {proc, y, ny) € H 


p’ = p[nn 4 unknown] 


(Hp, K) => (Hp, K) satisfied(e, H) 


(H, p, K) => (H, p, K) 


roleCheck (41. .y, ra) 


(proc, x, Nz) € A 
roleChOk(nz, r, (H, p, K)) 
Vi (proc, x:,ni) € H 
nodeCheck(n:, (H, p, K), S) 
S = offstage(H) U {ni}; 
p(ni) = ra(ni) 


satisfied(x==y, H.) iff {o | (proc, x,o) € H.} = {o| (proc, y,o) € He} 
satisfied(! (x==y), H-) iff not satisfied(x==y, H.) 


Figure 19: Symbolic Execution of Basic Statements 


1A. (Outgoing fields check) For fields f € F, if (n,f,n’) € 


H then p(n’) € fields (r). 

(Incoming slots check) Let {(n1, fi),..., (nk; fey} = 
{(n', f) | (n'y f,n) € H} be the set of all aliases of 
node n in abstract heap H. Then k = slotno(r) and 
there exists a permutation p of the set {1,...,k} such 
that (p(ni), fi) € slotp, (r) for all i. 

(Identity Check) If (n,f,n') € H, (n’,g,n") © H, 
(f,g) € identities(r), and K(n') =i, thenn=n". 
(Neighbor Identity Check) For every edge {n’, f,n) € H, 
if K(n') = 14, p(n’) =r" and (f,g) € identities(r’) then 
(n,g,n') € H. 

(Field Sanity Check) For every f € F there ts exactly 
one edge (n, f,n') € H. 


2A. 


8A. 


4A. 


5A. 


Conditions 1A and 2A correspond to conditions 1) and 2) 
in Definition 2. Condition 3) in Definition 19 is not neces- 
sarily implied by condition 3A) if some of the neighbors of 
n are summary nodes. Condition 3) cannot be established 
based only on summary nodes, because verifying an identity 
constraint for field f of node n where (n, f,n’) € H requires 
knowing the identity of n’, not only its existence and role. 
We therefore rely on Condition 2) of the Definition 22 to 
ensure that identity relations of neighbors of node n are sat- 
isfied before n moves offstage. 

The predicate acycCheck(n, G,S) verifies the acyclicity 
condition from Definition 19. 


Definition 26 We say that node n satisfies an acyclicity 
check in graph G = (H,p,K) with respect to set S, and we 
write acycCheck(n,G,S), iff it is not the case that H con- 
tains a cycle ni, fi,...,%s,fs,ni1 where ni =n, fi,...,fs € 
acyclic(p(n)) and ni,...,ns € S. 


This enables us to define the nodeCheck predicate. 


Definition 27 nodeCheck(n, G,S) holds iff both predicates 
localCheck(n, G) and acycCheck(n, G, S) hold. 
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7 ‘Interprocedural Role Analysis 


This section describes the interprocedural aspects of our role 
analysis. Interprocedural role analysis can be viewed as an 
instance of the functional approach to interprocedural data- 
flow analysis [41]. For each program point p, role analysis 
approximates program traces from procedure entry to point 
p. The solution in [41] proposes tagging the entire data-flow 
fact G at point p with the data flow fact Go at procedure en- 
try. In contrast, our analysis computes the correspondence 
between heaps at procedure entry and heaps at point p at 
the granularity of sets of objects that constitute role graphs. 
This allows our analysis to detect which regions of the heap 
have been modified. We approximate the concrete execu- 
tions of a procedure with procedure transfer relations con- 
sisting of 1) an initial context and 2) a set of effects. Effects 
are fine-grained transfer relations which summarize load and 
store statements and can naturally describe local heap mod- 
ifications. In this paper we assume that procedure transfer 
relations are supplied and we are concerned with a) verifying 
that transfer relations are a conservative approximation of 
procedure implementation b) instantiating transfer relations 
at call sites. 


7.1. Procedure Transfer Relations 


A transfer relation for a procedure proc extends the pro- 
cedure signature with an initial context context(proc), and 
procedure effects effect (proc). 


7.1.1 Initial Context 


Figures 20 and 21 contain examples of initial context speci- 
fication. An initial context is a description of the initial role 
graph (Hic, pic, Kic) where pic and Kic are determined by a 
nodes declaration and Hc is determined by a edges declara- 
tion. The initial role graph specifies a set of concrete heaps 


at procedure entry and assigns names for sets of nodes in 
these heaps. The next definition is similar to Definition 22. 


Definition 28 We say that a concrete heap (He, pc) is rep- 
resented by the initial role graph (Hic, pic, Kic) and write 
(He, pe) ao(Hic, pic; Kic), iff there exists a function ho 
nodes(H,) — nodes(Hic) such that 

conW(pc, He, hg '(read(proc)); 

- ho ts a graph homomorphism; 

. Kic(n) =% implies |ho*(n)| < 1; 


. Ao(nulle) = null and ho(proc,) = proc; 


wm KR w& tow 


- Pc(O) = pic(ho(o)) for every object o € nodes(H.). 


Here read(proc) is the set of initial-context nodes read by 
the procedure (see below). For simplicity, we assume one 
context per procedure; it is straightforward to generalize the 
treatment to multiple contexts. 

A context is specified by declaring a list of nodes and a 
list of edges. 

A list of nodes is given with nodes declaration. It specifies 
a role for every node at procedure entry. Individual nodes 
are denoted with lowercase identifiers, summary nodes with 
uppercase identifiers. By using summary nodes it is possible 
to indicate disjointness of entire heap regions and reachabil- 
ity between nodes in the heap. 

There are two kinds of edges in the initial role graph: pa- 
rameter edges and heap edges. A parameter edge p->pn is 
interpreted as (proc, p, pn) € Hic. We require every parame- 
ter edge to have an individual node as a target, we call such 
node a parameter node. The role of a parameter node refer- 
enced by param,(proc) is always preR,(proc). Since different 
nodes in the initial role graph denote disjoint sets of concrete 
objects, parameter edges 


pi -> ni 
p2 -> ni 


imply that parameters pi and p2 must be aliased, 


pi -> ni 
p2 -> n2 


force p1 and p2 to be unaliased, whereas 


pi -> ni|n2 
p2 -> ni|n2 


allow for both possibilities. A heap edge n -f-> m denotes 
{n,f,m) € Hic. The shorthand notation 


ni -f-> n2 
-g-> n3 


denotes two heap edges (ni, f,n2), (ni, g,n3) € Hic. An ex- 
pression ni -f-> n2|n3 denotes two edges nl -f-> n2 and 
ni -f-> n3. We use similar shorthands for parameter edges. 


Example 29 Figure 20 shows an initial context graph for 
the kill procedure from Example 17. It is a refinement of 
the role reference diagram of Figure 1 as it gives description 
of the heap specific to the entry of kill procedure. The 
initial context makes explicit the fact that there is only one 
header node for the list of running processes (ph) and one 
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SleepingProc 


nodes ph : RunningHeader, 
Pi, px, P2 : RunningProc, 
lx : LiveHeader, 


LL1, 12, LL2 : LiveList; 


edges p-> px, 1-> px, 
ph -next-> P1i|px 
-prev-> px|P2, 
Pi -next-> P1i|px 
-prev-> ph|P1, 
px -next-> P2|ph 
-prev-> Pilph, 
P2 -next-> P2|ph 
-prev-> P2|px, 
lx -next-> LLi|12, 


LL1 -next-> LL1i]12 

-proc-> P1i|P2|SleepingProc 
12 -next-> LL2[|null 

-proc-> px, 
LL2 -next-> LL2|null 

-proc-> P1i|P2|SleepingProc 


Figure 20: Initial Context for kill Procedure 


header node for the list of all active processes (1x). More im- 
portantly, it shows that traversing the list of active processes 
reaches a node 12 whose proc field references the parameter 
node px. This is sufficient for the analysis to conclude that 
there will be no null pointer dereferences in the while loop 
of kill procedure since 12 is reached before null. 


We assume that the initial context always contains the role 
reference diagram RRD (Definition 8). Nodes from RRD are 
called anonymous nodes and are referred to via role name. 
This further reduces the size of initial context specifications 
by leveraging global role definitions. In Figure 20 there is 
no need to specify edges originating from SleepingProc or 
even mention the node SleepingTree, since role definitions 
alone contain enough information on this part of the heap 
to enable the analysis of the procedure. 


procedure insert(1 : L, 
x : IsolatedN ->> LN) 
nodes ln, xn; 
edges 1-> ln, x-> xn, 
ln -next-> LN|null; 
effects ln|LN . xn, 
! xn.next = LN[null; 


P; 


next = 


local 
{ 
pel; 
c = l.next; 
while (c!=null) { 
pc; 
c = p.next; 


Cc, 


} 

p-next = x; 

Xx.next = c; 

setRole(x:LN); 
} 


Figure 21: Insert Procedure for Acyclic List 


7.1.2 Procedure Effects 


Procedure effects conservatively approximate the region of 
the heap that the procedure accesses and indicate changes 
to the referencing relationships in that region. There are two 
kinds of effects: read effects and write effects. 

A read effect specifies a set read(proc) of initial graph 
nodes accessed by the procedure. It is used to ensure that 
the accessibility condition in Section 5.4.3 is satisfied. If the 
set of nodes denoted by read(proc) is mapped to a node n 
which is onstage in the caller but is not an argument of the 
procedure call, a role check error is reported at the call site. 

Write effects are used to modify caller’s role graph to con- 

servatively model the procedure call. A write effect e1.f = e2 
approximates Store operations within a procedure. The ex- 
pression e; denotes objects being written to, f denotes the 
field written, and e2 denotes the set of objects which could 
be assigned to the field. Write effects are may effects by de- 
fault, which means that the procedure is free not to perform 
them. It is possible to specify that a write effect must be 
performed by prefixing it with a “!” sign. 
Example 30 In Figure 21, the insert procedure inserts 
an isolated cell into the end of an acyclic singly linked list. 
As a result, the role of the cell changes to LN. The initial 
context declares parameter nodes 1n and xn (whose initial 
roles are deduced from roles of parameters), and mentions 
anonymous LN node from a default copy of the role reference 
diagram RRD. The code of the procedure is summarized 
with two write effects. The first write effect indicates that 
the procedure may perform zero or more Store operations 
to field next of nodes mapped to 1n or LN in context(proc). 
The second write effect indicates that the execution of the 
procedure must perform a Store to the field next of xn node 
where the reference stored is either a node mapped onto 
anonymous LN node or null. 


Effects also describe assignments that procedures perform 
on the newly created nodes. Here we adopt a simple solution 
of using a single summary node denoted NEW to represent 
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procedure insertSome(1 : L) 
nodes ln; 
edges 1-> ln, 
ln -next-> LN|null; 
effects ln|LN . next = NEW, 
NEW.next = LN[null; 
aux Cc, Pp, 
{ 
pel; 
c = l.next; 
(cl=null) { 
p= c; 
c = p.next; 


x; 


while 


= new; 
next = x; 
-next = ¢; 

setRole(x:LN); 


MO KR WY 


Figure 22: Insert Procedure with Object Allocation 


all nodes created inside the procedure. We write nodeso(Hic) 
for the set nodes(Hic) U {NEW}. 


Example 31 Procedure insertSome in Figure 22 is similar 
to procedure insert in Figure 21, except that the node in- 
serted is created inside the procedure. It is therefore referred 
to in effects via generic summary node NEW. 


We represent all may write effects as a set mayWr(proc) of 
triples (nj, f,n;) where n,n; € nodeso(Hic) and f € F. We 
represent must write effects as a sequence mustWr;(proc) of 
subsets of the set K\c'(i) x F x nodeso(Hy). Here 1 <j < 
mustWrNo(proc). 

To simplify the interpretation of the declared proce- 
dure effects in terms of concrete reads and writes, we re- 
quire the union U;mustWr;(proc) to be disjoint from the 
set mayWr(proc). We also require the nodes ni,...,nz in 
a must write effect ni|---|ng.f = e2 to be individual nodes. 
This allows strong updates when instantiating effects (Sec- 
tion 7.3.2). 


7.1.3. Semantics of Procedure Effects 


We now give precise meaning to procedure effects. Our def- 
inition is slightly complicated by the desire to capture the 
set of nodes that are actually read in an execution while still 
allowing a certain amount of observational equivalence for 
write effects. 

The effects of procedure proc define a subset of per- 
missible program traces in the following way. Consider 
a concrete heap H, with role assignment p,. such that 
(He, pc) a0( Hic, pic, Kic) with graph homomorphism ho from 
Definition 28. Consider a trace T starting from a state with 
heap H, and role assignment p,. Extract the subsequence 
of all loads and stores in trace T. Replace Load x=y.f by 
concrete read read o, where oz is the concrete object refer- 
enced by x at the point of Load, and replace Store x.f=y by 
a concrete write o,.f = oy where og is the object referenced 
by x and o, object referenced by y at the point of Store. Let 


pi,---,pr be the sequence of all concrete read statements 
and qi,..-,q% the sequence of all concrete write statements. 
We say that trace T starting at H, conforms to the effects 
iff for all choices of ho the following conditions hold: 


1. ho(o) € read(proc) for every p; of the form read o 


2. there exists a subsequence qi,,..., qi, Of qi,..-,q~ Such 


that 


(a) executing qi,,...,qi, on H- yields the same result 
as executing the entire sequence qi,...,qk 


(b) the sequence qi,,..., qi, implements write effects 


of procedure proc 


A typical way to obtain a sequence qi,,...,q:, from the se- 
quence qi,...,q% is to consider only the last write for each 
pair (o0;, f) of object and field. 

We say that a sequence qi,,...,qi, implements write ef- 
fects mayWr(proc) and mustWr;(proc) for 1 < i < io, 
io = mustWrNo if and only if there exists an injection 
s:{1,...,io} > {t1,...,7¢} such that 


1. (h'(o), f,h’(o')) € mustWr;(proc) for every concrete 
write qq(;) of the form o.f =o’, and 


2. (h'(o), f,h’(o')) € mayWr(proc) for all concrete writes 
qi of the form of = o' for i € {é1,...,i} \ 


{s(1),..., $(to)}. 


Here h’(n) = ho(n) for n € nodes(H.) where H, is the initial 
concrete heap and h(n) = NEW otherwise. 

It is possible (although not very common) for a single 
concrete heap H, to have multiple homomorphisms ho to 
the initial context Hi-. Note that in this case we require the 
trace T’ to conform to effects for all possible valid choices 
of ho. This places the burden of multiple choices of ho on 
procedure transfer relation verification (Section 7.2) but in 
turn allows the context matching algorithm in Section 7.3.1 
to select an arbitrary homomorphism between a caller’s role 
graph and an initial context. 


7.2 Verifying Procedure Transfer Relations 


In this section we show how the analysis makes sure that a 
procedure conforms to its specification, expressed as an ini- 
tial context with a list of effects. To verify procedure effects, 
we extend the analysis representation from Section 6.1. A 
non-error role graph is now a tuple (H, p,K,7, E) where: 


1. + : nodes(H) — nodeso(Hi,) is initial context trans- 
formation that assigns an initial context node r(n) € 
nodes(Hic) to every node n representing objects that 
existed prior to the procedure call, and assigns NEW to 
every node representing objects created during proce- 
dure activation; 


2. E C U;mustWr; (proc) is a list of must write effects that 
procedure has performed so far. 


The initial context transformation 7 tracks how objects have 
moved since the beginning of procedure activation and is 
essential for verifying procedure effects which refer to initial 
context nodes. 

We represent the list FE of performed must effects as a par- 
tial map from the set Kj,’ (i) x F to nodeso(Hic). This allows 
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the analysis to perform must effect folding by recording only 
the last must effect for every pair (n, f) of individual node 
n and field f. 


(A, P k, T, E) 

P: {proc} x {param,(proc)}; 3 .N,P C Ac 
Ho = (Hic \ {proc} x param(proc) x N) UP 
ni = P(proc, param, (proc)) 

MM, C Ho 

Ay \ Ho C {(n', f,n") | {ni, no} A {ri} F OF 
Vj : localCheck(n;, (H, p, K), nodes(H1)) 


N41 n2Q Np 
HA, || Hz || «+: || # 


Jentrye] = { 


p= pr 
K= Kx 
T= pic 


E=0} 


Figure 23: The Set of Role Graphs at Procedure Entry 


7.2.1 Role Graphs at Procedure Entry 


Our role analysis creates the set of role graphs at proce- 
dure entry point from the initial context context(proc). This 
is simple because role graphs and the initial context have 
similar abstraction relations (Sections 6.1 and 7.1). The dif- 
ference is that parameters in role graphs point to exactly one 
node, and parameter nodes are onstage nodes in role graphs 
which means that all their edges are “must” edges. 

Figure 23 shows the construction of the initial set of role 
graphs. First the graph Ho is created such that every pa- 
rameter param,(proc) references exactly one parameter node 
nj. Next graph Hj is created by using localCheck to ensure 
that parameter nodes have the appropriate number of edges. 
Finally, the instantiation is performed on parameter nodes 
to ensure acyclicity constraints if the initial context does not 
make them explicit already. 


7.2.2 Verifying Basic Statements 


To ensure that a procedure conforms to its transfer relation 
the analysis uses the initial context transformation 7 to as- 
sign every Load and Store statement to a declared effect. 
Figure 24 shows new symbolic execution of Load, Store and 
New statements. 

The symbolic execution of Load statement x=y.f makes 
sure that the node being loaded is recorded in some read 
effect. If this is not the case, an error is reported. 

The symbolic execution of the Store statement x. f=y first 
retrieves nodes t(nz) and 7(n,) in the initial role graph 
context that correspond to nodes nz and ny in the current 
role graph. If the effect (7(nz), f, 7(my)) is declared as a may 
write effect the execution proceeds as usual. Otherwise, the 
effect is used to update the list EF of must-write effects. The 
list EF is checked at the end of procedure execution. 

The symbolic execution of the New statement updates the 
initial context transformation 7 assigning T(nn) = NEW for 
the new node nn. 


[Statement 5 | 


[re (Hw {proc, x, nz}, p, K,7, E) => (H w {proc, x, ns}, p, K,T, E) 


(HW {nz, f, nz}; p,k, 1, E) => (HY {ne, f, ny}, p,k, T, E') 


(Hw {ne, f,ng},p,K,7, E) > Le 


| zeny | (H & {proc, x, ne}, p, K, 7, E) => (HW {proc, x, nn}, p, K,7', E) 


(m1, f,m2)) = El(ni, f) > na] 


updateWr(E 


(H w& {proc, x, nz}, p, K, 1,E)=>1¢ 


(HW {nz, fins}; p,k, 1, E) => (HW {ne, f,ny}, p,k, T, £) 


{proc, y, Ny), (ny, f,ns) € A 
T(nf) € read(proc) 
(proc, y; Ny); (ny; f, ns) €H 
t(nys) ¢ read(proc) 

{proc, X, Nz), (proc, y, Ny) € H 
{r(nz), f,; T(ny)) € mayWr(proc) 
{proc, x, Nz), (proc, y, Ny) € H 
{r(nz), f, T(my)) € UsmustWr; (proc) 
E' = updateWr(E, (7(nz), f, T(ny))) 
(proc, x, Ne); {proc, Y; Ny) €H 
(t(nz), f,T(my)) ¢ mayWr(proc)U 
UsmustWr; (proc) 

Ny fresh 
7 =T[nn + NEW] 


Figure 24: Verifying Load, Store, and New Statements 


The 7 transformation is similarly updated during other 
abstract heap operations. Instantiation of node n’ into node 
no assigns T(no) = T(n’), split copies values of 7 into the new 
set of isomorphic nodes, and normalization does not merge 
nodes ni and nz if T(n1) # T(n2). 


7.2.3 Verifying Procedure Postconditions 


At the end of the procedure, the analysis verifies that p(ni) = 
postR,; (proc) where (proc, param, (proc),;) € H, and then 
performs node check on all onstage nodes using predicate 
nodeCheck(n, (H, p, K), nodes(H)) for all n € onstage(H). 
At the end of the procedure, the analysis also verifies 
that every performed effect in E = {e1,...,e,} can be at- 
tributed to exactly one declared must effect. This means 
that k = mustWrNo(proc) and there exists a permutation s 
of set {1,...,k} such that e,(;, € mustWr;(proc) for all 7. 


7.3. Analyzing Call Sites 


The set of role graphs at the procedure call site is up- 
dated based on the procedure transfer relation as follows. 
Consider procedure proc containing call site p € Ncr¢(proc) 
with procedure call proc’(x1,...,@p). Let (Hic, pic, Kic) = 
context(proc’) be the initial context of the callee. 

Figure 25 shows the transfer function for procedure call 
sites. It has the following phases: 


1. Parameter Check ensures that roles of parameters 
conform to the roles expected by the callee proc’. 


2. Context Matching (matchContext) ensures that the 
caller’s role graphs represent a subset of concrete heaps 
represented by context(proc’). This is done by deriving 
a mapping yp from the caller’s role graph to nodes(Hic). 


3. Effect Instantiation (35 uses effects mayWr(proc’) 
and mustWr;(proc’) in order to approximate all struc- 
tural changes to the role graph that proc’ may perform. 
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[proc'(a1,...,#p)](G) = 
if IG € G : aparamCheck(G) then {Le} 
else try G1 = matchContext(G) 
if failed then {Le} 
else {G" | (G, pn) € Gi 


(addNEW(G), p) —>(G’ 


1H) G"} 


paramCheck((H, p, K,7,.E)) iff 
Vn; : nodeCheck(n;, G, offstage(H) U {ni };) 
m4 are such that (proc, vi,ni) € H 


addNEW((H, p, K,7,E)) = 
(HU {no} x Fx {null}, 
pino + unknown], 
K[no + s], 
T[no + NEW], 
E) 

where no is fresh in H 


Figure 25: Procedure Call 


4. Role Reconstruction (2 uses final roles for param- 
eter nodes and global role declarations postR;(proc’) to 
reconstruct roles of all nodes in the part of the role 
graph representing modified region of the heap. 


The parameter check requires nodeCheck(ni, G, offstage(H)U 
{ni}:) for the parameter nodes nj. The other three phases 
are explained in more detail below. 


7.3.1 Context Matching 


Figure 26 shows our context matching function. The 
matchContext function takes a set G of role graphs and pro- 
duces a set of pairs (G, 4) where G = (H, p, K,7,E) is arole 
graph and p is a homomorphism from H to Hic. The homo- 
morphism p guarantees that a~!(G) C ag (context(proc’)) 


matchContext(G) = match({(G, nodes(G) x {L}) | G € G}) 


match : P(RoleGraphs x (N U {L})*) — P(RoleGraphs x NY) 


match(T) = 


To := {(G,n) ET | w1(L) 4 0}; 


if [9 =@ then return T; 


((H, p, K,7, E), 1) := choose To; 


I = \ ({H,¢,K,7, E),p); 


paramnodes := {n | 32 : (proc, xi,n) € H}; 
inaccessible := onstage(H) \ paramnodes; 


no := choose p~*(L); 


candidates := {n’ € nodes( Hc) | 


(no ¢ inaccessible and pic(n’) = p(no)) or 
(no € inaccessible and n’ ¢ read(proc’))} 


AQ {n' 
(no, f nye 
B(njAL 
AQ {n' 
(n, f noe 
w(njAL 
if candidates = 9 then fail ; 


(n!, f,w(n)) € Hie } 


((n), fen!) € Hoe 


if candidates = {ny}, K(no) = 8, Kic(no) = 4, w 1 (no) = 0 
then match(I’ U {(G’,u[ni 2 nb]) | (Hp, K,7, E) tt G'}) 
no 


else ng := 


choose {n’ € candidates | K(n’) = s or 


(K(no) =i, p7*(n') = 0)} 
match(I” U ((H, p, K,7, E), p[no + no])); 


Figure 26: The Context Matching Algorithm 


since the homomorphism ho from Definition 28 can be con- 
structed from homomorphism h in Definition 22 by putting 
ho = poh. This implies that it is legal to call proc’ with any 
concrete graph represented by G. 


The algorithm in Figure 26 starts with empty maps p = 
nodes(G) x {1} and extends yw until it is defined on all 
nodes(G) or there is no way to extend it further. It pro- 
ceeds by choosing a role graph (H, p,K,7,E) and node no 
for which the mapping yp is not defined yet. It then finds 
candidates in the initial context that no can be mapped to. 
The candidates are chosen to make sure that pp remains a 
homomorphism. The accessibility requirement—that a pro- 
cedure may see no nodes with incorrect role—is enforced 
by making sure that nodes in inaccessible are never mapped 
into nodes in read for the callee. As long as this requirement 
holds, nodes in inaccessible can be mapped onto nodes of any 
role since their role need not be correct anyway. We gener- 
ally require that the set p~'(no) for individual node nj in 
the initial context contain at most one node, and this node 
must be individual. In contrast, there might be many indi- 
vidual and summary nodes mapped onto a summary node. 
We relax this requirement by performing instantiation of a 
summary node of the caller if, at some point, that is the only 
way to extend the mapping p (this corresponds to the first 
recursive call in the definition of match in Figure 26). 


The algorithm is nondeterministic in the order in which 
nodes to be matched are selected. One possible ordering 
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of nodes is depth-first order in the role graph starting from 
parameter nodes. If some nondeterministic branch does not 
succeed, the algorithm backtracks. The function fails if all 
branches fail. In that case the procedure call is considered 
illegal and Le is returned. The algorithm terminates since 
every procedure call lexicographically increases the sorted 
list of numbers |u[nodes(H)]| for ((H, p, K,7, E),u) €T. 


7.3.2 Effect Instantiation 


The result of the matching algorithm is a set of pairs (G, p) 
of role graphs and mappings. These pairs are used to instan- 
tiate procedure effects in each of the role graphs of the caller. 
Figure 30 gives rules for effect instantiation. The analysis 
first verifies that the region read by the callee is included in 
the region read by the caller. Then it uses map yp to find 
the inverse image S of the performed effects. The effects in 
S are grouped by the source n and field f. Each field n.f 
is applied in sequence. There are three cases when applying 
an effect to n.f: 


1. There is only one node target of the write in nodes(H) 
and the effect is a must write effect. In this case we do 
a strong update. 


2. The condition in 1) is not satisfied, and the node n is 
offstage. In this case we conservatively add all relevant 
edges from S$ to H. 


3. The condition in 1) is not satisfied, but the node n is 
onstage i.e. it is a parameter node®. In this case there 
is no unique target for n.f, and we cannot add multi- 
ple edges either as this would violate the invariant for 
onstage nodes. We therefore do case analysis choosing 
which effect was performed last. If there are no must ef- 
fects that affect n, then we also consider the case where 
the original graph is unchanged. 


7.3.3 Role Reconstruction 


Procedure effects approximate structural changes to the 
heap, but do not provide information about role changes 
for non-parameter nodes. We use the role reconstruction 


algorithm AR in Figure 27 to conservatively infer possible 
roles of nodes after the procedure call based on role changes 
for parameters and global role definitions. 

Role reconstruction first finds the set No of all nodes that 
might be accessed by the callee since these nodes might have 
their roles changed. Then it splits each node n € No into |R| 
different nodes p(n,r), one for each role r € R. The node 
p(n,r) represents the subset of objects that were initially 
represented by n and have role r after procedure executes. 
The edges between nodes in the new graph are derived by 
simultaneously satisfying 1) structural constraints between 
nodes of the original graph; and 2) global role constraints 
from the role reference diagram. The nodes p(n,71) not con- 
nected to the parameter nodes are garbage collected in the 
role graph. In practice, we generate nodes p(n,7r) and edges 
on demand starting from parameters making sure that they 
are reachable and satisfy both kinds of constraints. 


8 Extensions 


This section presents two extensions of the basic role system. 
The first extension allows statically unbounded number of 
aliases for objects. The second extension allows the analysis 
to verify more complex role changes. Additional ways of 
extending roles are given in [31]. 


8.1 Miultislots 


A multislot (r’, f) € multislots(r) in the definition of role r 
allows any number of aliases (o', f,o) € He for pe(o’) = 1’ 
and p-(0) = r. We require multislots multislots(r) to be 
disjoint from all slot; (r). To handle multislots in role analysis 
we relax the condition 5) in Definition 22 of the abstraction 
relation by allowing h to map more than one concrete edge 
{o', f,0) onto abstract edge (n’, f,n) € H terminating at an 
onstage node n provided that (p(n’), f) € multislots(p(n)). 
The nodeCheck and expansion relation < are then extended 
appropriately. Note that a role graph does not represent 
the exact number of references that fill each multislot. The 
analysis therefore does not attempt to recognize actions that 
remove the last reference from the multislot. Once an object 
plays a role with a multislot, all subsequent roles that it plays 
must also have the multislot. 


%Non-parameter onstage nodes are never affected by ef- 
fects, as guaranteed by the matching algorithm. 
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role BufferNode { 
fields next : BufferNode | null; 
slots BufferNode.next | main.buffer; 
acyclic next; 

a3 

role WorkNode { 
fields next : WorkNode | null; 

WorkNode.next | main.work; 

acyclic next; 


} 


procedure main() 
rootvar buffer : BufferNode | null, 


work : WorkNode | null; 

auxvar xX, y; 
{ 

// create buffer and work lists 

// swap buffer and work 

x = buffer; 

y = work; 

buffer = y; 

work = x; 

setRoleCascade(x:WorkNode, y:BufferNode) ; 
} 


Figure 28: Example of a Cascading Role Change 


8.2 Cascading Role Changes 


In some cases it is desirable to change roles of an entire set of 
offstage objects without bringing them onstage. We use the 
statement setRoleCascade(%1 : 11,...,%n : Tn) to perform 
such cascading role change of a set of nodes. The need for 
cascading role changes arises when roles encode reachability 
properties. 


Example 32 Procedure main in Figure 28 has two root 
variables, buffer and work, each being a root for a 
singly linked acyclic list. Elements of the first list have 
BufferNode role and elements of the second list have 
WorkNode role. At some point procedure swaps the root 
variables buffer and work, which requires all nodes in both 
lists to change the roles. These role changes are triggered 
by the setRoleCascade statement. The statement indicates 
new roles for onstage nodes, and the analysis cascades role 
changes to offstage nodes. 


Given arole graph (H, p, K, E) cascading role change finds 
a new valid role assignment p’ where the onstage nodes 
have desired roles and the roles of offstage nodes are ad- 
justed appropriately. Figure 29 shows abstract execution 
of the setRoleCascade statement. Here neighbors(n, H) 
denotes nodes in H adjacent to n. The condition 
cascadingOk(n, H, p,K, p') makes sure it is legal to change 
the role of node n from p(n) to p'(n) given that the neigh- 
bors of n also change role according to p’. This check resem- 
bles the check for setRole statement in Section 6.2.3. Let 
r = rho(n) and r’ = p'(n). Then cascadingOk(n, H, p, K, p’) 
requires the following conditions: 


1. (n, f,ni) € H implies p'(n1) € field; (r’) 


RR 
((H, p, K, 1, E), ») —>(H', p', K',7', E') 


{proc, 2:,ni) € H 

No = po *[read(proc’)] 

s:Nox R—-N where s(n,r) are all different nodes fresh in H 

p =; p\ (No x R) U {(s(n,r),r) | ne€ Nor € R} 
\Cni}i x R) U {(ni, postR;(proc))} 

K'(s(n,r)) = K(n) 

T'(s(n,r)) = T(n) 

E'=E 

Ho = H\ {(ni, f, n2) | n, € No or ng € No} 
U {(s(m1, 11), f, 8(m2,r2)) | (ni, f,n2) € H, (ri, f,r2) € RRD} 
U {(n1, f,s(n2,r2)) | (ni, f,n2) € H, (pic(e(n1)), fs r2) € RRD} 
U {(s(m1, 11), f, n2) | (ni, f,n2) € H, (ri, f pic(H(n2))) € RRD} 


H' = GC(Ho) 


Figure 27: Call Site Role Reconstruction 


ni: 


(H, p, k, T; E)~(H, pK, T; E) 


8 = setRoleCascade(11 :11,...,%n : Tn) 


{proc, v1,ni) € H 

p' (ni) = 7 

p'(n) = p(n), n € onstage(H) \ {ni}: 

No = {n € offstage(H) | dn’ € neighbors(n, H): p(n’) £ p'(n’)} 


Yn € No: cascadingOk(n, H, p, K, p’) 


Figure 29: Abstract Execution for setRoleCascade 


2. slotno(r’) = slotno(r) = k, and for every list 
(ni, fi,n),.--, (nk, fe,n) € H if there is a permuta- 
tion p: {1,...,k} > {1,...,&} such that (p(n), fi) € 
slotp,(r), then there is a permutation p’ : {1,...,k} 3 
{1,...,k} such that (p(ni), fi) € slotp, (r’). 


3. identity relations were already satisfied or can be ex- 
plicitly checked: (f,g) € identities(p’(n)) implies 


(a) (f,g) € identities(p(n)) or 
(b) for all (n, f,n’) € H: K(n’) =i, and 
if (n',g,n") € H then n"” =n 


4, either acyclic(p'(n)) C acyclic(p(n)) or 
acycCheck(n, (H, p’, K), offstage(H)). 


In practice there may be zero or more solutions that satisfy 
constraints for a given cascading role change. Selecting any 
solution that satisfies the constraints is sound with respect 
to the original semantics. A useful heuristic for searching 
the solution space is to first explore branches with as few 
roles changed as possible. If no solutions are found, an error 
is reported. 


9 Related Work 


Typestate, as a type system extension for statically verifying 
dynamically changing properties, was proposed in [44, 43]. 
Aliasing causes problems for typestate-based systems be- 
cause the declared typestates of all aliases must change 
whenever the state of the referred object changes. Faced 
with the complexity of aliasing, [44] resorted to a more con- 
trolled language model which avoids aliasing. More recently 
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proposed typestate approaches use linear types for heap ref- 
erences to support state changes of dynamic allocated ob- 
jects without addressing aliasing issues [10]. 

Motivated by the need to enforce safety properties in low- 
level software systems, [42, 46, 9] use extensions of linear 
types to describe aliasing of objects and rely on language 
design to avoid non-local type inference. These systems take 
a construction based approach that specifies data structures 
as unfoldings of basic elaboration steps [46]. Similarly to 
shape types [15, 14] and graph types [29, 34], this allows 
tree-like data structures to be expressed more precisely than 
using our roles, but cannot approximate data structures such 
as sparse matrices. More importantly, this approach makes 
it difficult to express nodes that are members of multiple 
data structures. Handling multiple data structures is the 
essential ingredient of our approach because the role of an 
object depends on data structures in which it participates. 

Like shape analysis techniques [5, 17, 39, 40] we have 
therefore adopted the constraint based approach which char- 
acterizes data structures in terms of the constraints that they 
satisfy. The constraint based approach allows us to handle a 
wider range of data structure while giving up some precision. 
Like [47, 48] we perform non-local inference of program prop- 
erties, but while [47, 48] focus on linear integer constraints 
and handle recursive data structures conservatively, we do 
not handle integer arithmetic but have a more precise rep- 
resentation of the heap. At a higher level, these approaches 
all focus on detailed properties of individual data structures. 
We view our research as focusing more on global aspects such 
as the participation of objects in multiple data structures. 

The path matrix approaches [18, 17] have been used to 
implement efficient interprocedural analyses that infer one 


level of referencing relationships, but are not sufficiently pre- 
cise to track must aliases of heap objects for programs with 
destructive updates of more complex data structures. 

The use of the instantiation relation in role analysis is 
analogous to the materialization operation of [39, 40]. Role 
analysis can also track reachability properties, but we use an 
abstraction relation based on graph homomorphism rather 
than 3-valued logic. Our split operation achieves a similar 
goal to the focus operation of [40]. However, the generic 
focus algorithm of [32] cannot handle the reachability predi- 
cate which is needed for our split operation. This is because 
it conservatively refuses to focus on edges between two sum- 
mary nodes to avoid generating an infinite number of struc- 
tures. Rather than requiring definite values for reachability 
predicate, our role analysis splits by reachability properties 
in the abstract role graph, which illustrates the flexibility 
of the homomorphism-based abstraction relation. Another 
difference with [40] is that our role analysis does not require 
the developer to supply the predicate update formulae for 
instrumentation predicates. 

A precise interprocedural analysis [38] extends shape anal- 
ysis techniques to treat activation records as dynamically al- 
located structures. The approach also effectively synthesizes 
an application-specific set of contexts. Our approach differs 
in that it uses a less precise but more scalable treatment of 
procedures. It also uses a compositional approach that an- 
alyzes each procedure once to verify that it conforms to its 
specification. Like [48] our interprocedural analysis can ap- 
ply both may and must effects, but our contexts are general 
graphs with summary nodes and not trees. 

Roles are similar to the ADDS and ASAP data structure 
description languages [25, 26, 23]. These systems use sound 
techniques to apply the data structure invariants for paral- 
lelization and general dependence testing but do not verify 
that the data structure invariants are preserved by destruc- 
tive updates of data structures [24]. 

The object-oriented community has long been aware of 
benefits that dynamically changing classes give in large sys- 
tems [37]. Recognizing these benefits, researchers have pro- 
posed dynamic techniques that change the class of an object 
to reflect its state changes [16, 20, 4, 13]. These systems 
illustrate the need for a static system that can verify the 
correct use of objects with changing roles. 


10 Conclusion 


This paper proposes two key ideas: aliasing relationships 
should determine, in large part, the state of each object, 
and the type system should use the resulting object states 
as its fundamental abstraction for describing procedure in- 
terfaces and object referencing relationships. We present a 
role system that realizes these two key ideas in a concrete 
system, and present an analysis algorithm that can verify 
that the program correctly respects the constraints of this 
role system. The result is that programmers can use roles 
for a variety of purposes: to ensure the correctness of ex- 
tended procedure interfaces that take the roles of parameters 
into account, to verify important data structure consistency 
properties, to express how procedures move objects between 
data structures, and to check that the program correctly im- 
plements correlated relationships between the states of mul- 
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tiple objects. We therefore expect roles to improve the re- 
liability of the program and its transparency to developers 
and maintainers. 
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T[u *[read(proc’)]] Z read(proc) 


T[ *[read(proc’)]] C read(proc) 
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Single Write Effect Instantiation: 


nf 
(A, p1, 41,71, £1) F G' 


condition 


deterministic effect 
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H = orem(.H)U 
{(n, f,m) | {n, f,m) € S} 


G'=1e 


{(7(n), f, 7(m1)) [{n, f, ra) € S} Z mayWr(proc) 


l{n1 | (n, f,n1) € S}| > 1 or 
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a 
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Figure 30: Effect Instantiation 
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